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Preface 


This book is written for the applied scientist and engineer who wants or 
needs to learn about a subject but is not an expert in the specific field. It is 
also written to accompany a first graduate course in digital signal processing. 
In this book we have selected the field of adaptive filtering, which is an 
important part of statistical signal processing. The adaptive filters have 
found use in many and diverse fields such as communications, control, radar, 
sonar, seismology, etc. 

The aim of this book is to present an introduction to optimum filtering 
as well as to provide an introduction to realizations of linear adaptive filters 
with finite duration impulse response. Since the signals involved are ran- 
dom, an introduction to random variables and stochastic processes are also 
presented. ; 

The book contains all the material necessary for the reader to study its 
contents. An appendix on matrix computations is also included at the end 
of the book to provide supporting material. The book includes a number of 
MATLAB® functions and m-files for practicing and verifying the material in 
the text. These programs are designated as Book MATLAB Functions. The 
book includes many computer experiments to illustrate the underlying the- 
ory and applications of the Wiener and adaptive filtering. Finally, at the end 
of each chapter (except the first introductory chapter) numerous problems 
are provided to help the reader develop a deeper understanding of the 
material presented. The problems range in difficulty from undemanding 
exercises to more elaborate problems. Detailed solutions or hints and sug- 
gestions for solving all of these problems are also provided. 

Additional material is available from the CRC Web site, www.crc- 
press.com. Under the menu Electronic Products (located on the left side of 
the screen), click Downloads & Updates. A list of books in alphabetical order 
with Web downloads will appear. Locate this book by a search or scroll down 
to it. After clicking on the book title, a brief summary of the book will appear. 
Go to the bottom of this screen and click on the hyperlinked “Download” 
that is in a zip file. 

MATLAB?’ is a registered trademark of The Math Works, Inc. and is used 
with permission. The Math Works does not warrant the accuracy of the text 
or exercises in this book. This book’s use or discussion of MATLAB® software 
or related products does not constitute endorsement or sponsorship by The 


Math Works of a particular pedagogical approach or particular use of the 
MATLAB? software. 
For product information, please contact: 


The Math Works, Inc. 

3 Apple Hill Drive 

Natick, MA 01760-2098 USA 
Tel: 508-647-7000 

Fax: 508-647-7001 

E-mail: info@mathworks.com 
Web: www.mathworks.com 
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chapter 1 


Introduction 


1.1 Signal processing 


In numerous applications of signal processing and communications we are 
faced with the necessity to remove noise and distortion from the signals. 
These phenomena are due to time-varying physical processes, which some- 
times are unknown. One of these situations is during the transmission of a 
signal (message) from one point to another. The medium (wires, fibers, 
microwave beam, etc.), which is known as the channel, introduces noise and 
distortion due to the variations of its properties. These variations may be 
slow varying or fast varying. Since most of the time the variations are 
unknown, it is the use of adaptive filtering that diminishes and sometimes 
completely eliminates the signal distortion. 

The most common adaptive filters, which are used during the adaptation 
process, are the finite impulse response filters (FIR) types. These are prefer- 
able because they are stable, and no special adjustments are needed for their 
implementation. | 

The adaptation approaches, which we will introduce in this book, are: 
the Wiener approach, the least-mean-square algorithm (LMS), and the 
least-squares (LS) approach. 


1.2 An example 


One of the problems that arises in several applications is the identification 
of a system or, equivalently, finding its input-output response relationship. 
To succeed in determining the filter coefficients that represent a model of 
the unknown system, we set a system configuration as shown in Figure 1.2.1. 

The input signal, {x(7)}, to the unknown system is the same as the one 
entering the adaptive filter. The output of the unknown system is the desired 
signal, {d(n)}. From the analysis of linear time-invariant systems (LTI), we 
know that the output of linear time-invariant systems is the convolution of 
their input and their impulse response. 
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e(n) = d (n) — y(n) 


Figure 1.2.1 System identification. 


Let us assume that the unknown system is time invariant, which indi- 
cates that the coefficients of its impulse response are constants and of finite 
extent (FIR). Hence, we write 


N-I 
d(n) = yh, x(n—k) (1.2.1) 
k=0 


The output of an adaptive FIR filter with the same number of coefficients, 
N, is given by 


y(n) = Y wxi —k) (1.2.2) 


For these two systems to be equal, the difference e(n) = d(n) — y(n) must be 
equal to zero. Under these conditions the two sets of coefficients are equal. 
It is the method of adaptive filtering that will enable us to produce an error, 
e(n), approximately equal to zero and, therefore, will identify that w,’s = h,’s. 


1.3 Outline of the text 


Our purpose in this text is to present the fundamental aspects of adaptive 
filtering and to give the reader the understanding of how an algorithm, LMS, 
works for different types of applications. These applications include system 
identification, noise reduction, echo cancellation during telephone conver- 
sation, inverse system modeling, interference canceling, equalization, spec- 
trum estimation, and prediction. In order to aid the reader in his or her 
understanding of the material presented in this book, an extensive number 
of MATLAB functions were introduced. These functions are identified with 
the words “Book MATLAB Function.” Ample numbers of examples and 
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figures are added in the text to facilitate the understanding of this particular 
important signal processing technique. At the end of each chapter, except 
this introductory chapter, we provided many problems and either complete 
solutions or hints and suggestions for solving them. 

We have tried to provide all the needed background for understanding 
the idea of adaptive filters and their uses in practical applications. Writing 
the text, we assumed that the reader will have knowledge at the level of a 
bachelor’s degree in electrical engineering. Although only a small amount 
of new results is included in this text, its utility of the presented material 
should be judged by the form of presentation and the successful transferring 
of the fundamental ideas of adaptive filtering and their use in different areas 
of research and development. 

To accomplish the above mentioned goals, we have started introducing 
digital signals and their representation in the frequency domain and z-transform 
domain in Chapter 2. Next, we present in block diagram form the three 
fundamental discrete systems: finite impulse response (FIR), infinite impulse 
response (IIR), and the combined system known as the autoregressive mean 
average (ARMA). 

Since most of the input signals in applications of adaptive filtering are 
random signals, we introduce the notion of random variables, random 
sequences, and stochastic processes in Chapter 3. Furthermore, we introduce 
the concepts, and the approaches of finding the power spectral density of 
random signals. 

Chapter 4 develops the foundation for determining minimum mean-square 
error (MSE) filters. The chapter introduces the Wiener filter, and the 
“bowl-shaped” error surface. The Wiener filter is also used in a special 
configuration named self-correcting filtering. 

Since the magnitude of the difference between the maximum and min- 
imum value of the eigenvalues of the correlation matrix plays an important 
role in the rate of convergence of adaptation, Chapter 5 introduces the theory 
and properties of the eigenvalues and the properties of the error surface. 

Chapter 6 introduces the following two gradient search methods: the 
Newton method and the steepest descent method. A derivation of the con- 
vergence properties of the steepest descent method is presented, as well as 
the valuable geometric analogy of finding the minimum point of the 
“bowl-shaped” error surface. 

Chapter 7 introduces the most celebrated algorithm of adaptive filtering, 
the LMS algorithm. The LMS algorithm approximates the method of steepest 
descent. In addition, many examples are presented using the algorithm in 
diverse applications, such as communications, noise reduction, system iden- 
tification, etc. 

Chapter 8 presents a number of variants of the LMS algorithm, which 
have been developed since the introduction of the LMS algorithm. 

The last chapter, Chapter 9, covers the least squares and recursive least 
squares signal processing. 

Finally, an Appendix was added to present elements of matrix analysis. 


chapter 2 


Discrete-time signal 
processing 


2.1 Discrete-time signals 


Discrete-time signals are seldom found in nature. Therefore, in almost all cases, 
we will be dealing with the digitization of continuous signals. This process will 
produce a sequence {x(7T)} from a continuous signal x(t) that is sampled at 
equal time distance T. The sampling theorem tells us that, for signals that have a 
finite spectrum (band-limited signals) and whose highest frequency is @y 
(known as the Nyquist frequency), the sampling frequency œ, must be twice as 
large as the Nyquist frequency or, equivalently, the sampling time T must be 
less than one half of its Nyquist time, 27/@y. In our studies we will consider 
that all the signals are band-limited. This is a reasonable assumption since we 
can always pass them through a lowpass filter (pre-filtering). The next section 
discusses further the frequency spectra of sampled functions. 


Basic discrete-time signals 
A set of basic continuous and the corresponding discrete signals are included 
in Table 2.1.1. 


Table 2.1.1 Continuous and Discrete-Time Signals 


Delta Function 


O(t) = 0 f#0 
f seat =l 


1 n=0 
(nT) = 
0 


n#0 


(Continued ) 
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Table 2.1.1 Continuous and Discrete-Time Signals (Continued) 


Unit Step Function 


1 f{>0 
u(t) = 
0 t<0 
1 n20 
u(nT)= 
0 n<0 


The Exponential Function 


x(t) =e" u(t) 


x(nT) =e"! u(nT) =(e ’ u(nT) = b"u(nT) 


2.2 Transform-domain representation 
of discrete-time signals 


Discrete-time Fourier transform (DTFT) 


Any non-periodic discrete-time signal x(f) with finite energy has a discrete-time 
Fourier transform (DTFT), which is found by an approximation of the Fourier 
transform. The transform is found as follows: 


f oo , Z nT 
X(e'") = F{x(nT)} = | x(te "dt = și x(t)dt 
—00 a nT-T 
(2.2.1) 
D x(nT)e 


where the exact integral from nT — T to nT has been replaced by the 
approximate area T x x(nT). When T = 1, the above equation takes the 
form 


X(e!”) = 5 xne (2.2.2) 


yli=—o0 


The relation X(e°*”) = X(ePe*7)= X(e®) indicates that the DTFT produces 
spectra that are periodic with period 2z. 


Example 2.2.1: Plot the magnitude and phase spectra for the time func- 
tion x(n) = 0.9"u(n). 
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Solution: The DTFT of x(n) is given by 


ule) = > 0.9" 


n=0 


= > (0.92) Se — 
= 1-0.9e" 


1 
= (1-0.9 cos @)+j0.9sin w 


ee ee SS a meoo = Aloe 


(1—0.9cos @)* +(0.9sin oY pjtan™ (0 9sinw/(1-0.9 coso) 


A(œw)=1⁄4 (1-0.9 cosæY +(0.9sin oY ), 
olw) =- tan” (0.9sin w/(1 - 0.9 cos@)) 


where A(q@) is the magnitude spectrum and @(@) is the phase spectrum. In the 
development we used the Euler’s identity e*} = cos(0) + jsin(@). These two 
spectra are shown in Figure 2.2.1. The plots are shown only for the period 
- — <<, whichis the standard presentation of the spectra for digital signals. 


10 
8 
© 6 
<= 4 
2 
0 
4 =3 9 zi 0 1 2 3 4 
@ 
(a) 
1.5 
i 
0.5 
S o 
©- 
-0.5 
=j 
=15 
-3 2) =] 0 1 2 3 4 
wo 


Figure 2.2.1 Illustration of Example 2.2.1. 
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The discrete Fourier transform (DFT) 


The N-point discrete Fourier transform (DFT) of a finite signal x(t) sampled 
at times T apart is given by 


N-1 . 20 
x(a 2 Jar} zane "x" k=0,1,2,---,N-1 (2.2.3) 
The inverse DFT is given by 
N-} 20 
1 QIU. Ron 
T)=— ) X(k—Je N =0,1,2,---,N-1 2.2.4 
x(n) = È (ke Non (2.2.4) 


For T = 1, the above equations become 


(2.2.5) 


If we replace k in (2.2.5) by k + N we find that X(A) is periodic with period 
N. Similarly, if we introduce n + N instead of n in the same equation, we 
find that x(n) is also periodic with period N. This indicates that the DFT is 
associated with periodic sequences in both time and frequency domain. To 
obtain the DFT of a sequence and the inverse DFT (IDFT), we use the 
following MATLAB functions: 


XxX=fft (x,N); 
x=ifft (X,N); 


X is an N-point sequence (vector) and x is the input time sequence (vector). 
The middle point N/2 of X is known as the fold-over frequency. Plotting X 
vs. frequency (vs. k2z/N), where 27/N is the bin frequency, we find that 7 
is the middle point and the frequency spectrum from z to 27 is a reflection 
of the spectrum from 0 to m. Similarly we find that 2/T is the fold over 
frequency when T is different than one. This indicates that to approximate 
the spectrum of a continuous signal close enough, we must use sampling 
time T that is very small. 


Example 2.2.2: Find the DFT of the signal x(t) = exp(-f)u(#). 
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Solution: First we observe that we cannot take the DFT of the signal 
from t = 0 to infinity. Because of this limitation, let us find the DFT of the 
signal from t = 0 to t = 4 and with sampling time T = 1 and T = 0.5. The 
results will be compared with the exact FT of the signal. The FT of the signal 
is given by 


X(o) = [ exp(-t)u(£)exp(—jat)dt = J exp(-(1+ja)E)dt 


[exp(—(1 + jo) 








7 —(1+ jo) 
= 2 [0-1]= : = = 1/2 ee A(a)e* 
1+jo@ 1l+j@ (1+0*) 


where the polar form of the complex variable 1+ j@= 1° + @’ exp[j tan” (2)] 
was used. To find the DFT of the signal for T = 1, we used the following 
Book MATLAB program: 


t=0:1:4; 

x=exp(-t); 

dftxl=fft(x,32); % asked for 32 spectrum bins, MATLAB will 
pad 32-5 zeros the vector x; 

w=0:2*pi/32:2*pi-(2*pi/32); 

subplot (211);stem(w,abs (dftx1)); 

FT1=1./sqrt(itw.*2);hold on;stem(w,abs(FT1),'filled'); 

xlabel('\omega');ylabel('Magnitudes of FT and DFT'); 

title('(a)');axis([0 2*pi 0 2]);legend('DFT','FT'); 


Next, we used the same Book MATLAB program but with a sampling 
time T = 1/2. Hence, the following changes were introduced into the program: 


ne=020..5 943 

x=exp({-nt); 

dftx2=0.5*fft(x,32); 

w=0 : 4*pi/32:4*pi-(4*pi/32); 

subplot (212);stem(w,abs(dftx2)); 

FT2=1./sqrt (laws "24> 

hold on;stem(w,abs(FT2),'filled'); 
xlabel('\omega');ylabel('Magnitudes of FT and DFT'); 
title('(b)}');axis([0 4*pi 0 1.5]);legend('DFT','FT'); 


Figure 2.2.2a indicates the following: (a) The amplitude of the FT of 
the continuous signal is constantly decreasing as it should be. (b) The 
amplitude of the DFT of x(t), for the range 0<t<4 has a folding fre- 
quency 7/1 (T = 1 in this case). (c) The approximation by the DFT to the 
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Magnitudes of FT and DFT 





Oo 


YVOOOQOQOOO 


Magnitudes of FT and DFT 





Figure 2.2.2 Illustration of Example 2.2.2. 


exact spectrum is up to the folding frequency. (d) For this set of condi- 
tions there is a considerable difference in the two spectra. (e) If the DFT 
command was fft(x,64), we would have found that the total points for 
the same frequency range would have been twice as many but the 
accuracy would have remained the same. (f) If, however, we had 
increased the time range greater than 4, then the accuracy would have 
improved. 

Figure 2.2.26 shows improvement to the accuracy of the spectrum 
due to decreasing of the sampling interval to T = 0.5 (increasing the 
sampling frequency, @,=27/T). If, in addition, we had increased the time 
range greater than 4, the approximation would have been even better. 
However, since the largest frequency of this signal is infinite, no finite 
value of T would have been able to produce the exact spectrum. If, on 
the other hand, we had band-limited signal with its largest frequency @y 
(Nyquist frequency), then we would have been able to sample the time 
function at twice its Nyquist frequency, and the two spectra would have 
been identical. 
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Note that, although the spectrum of the continuous signal is continuous, 
we have only plotted the values of the continuous spectrum at the same 
frequency bins as the discrete one. 


2.3 The Z-Transform 


The z-transform of a sequence is a transformation of a discrete-time 
function from its time domain to its frequency domain (and/or 
z-domain). It is also a powerful tool to study linear time-invariant discrete 
systems. The z-transform is defined by the following equation (see also 
Problem 2.3.1): 


X(z) = Z{x(n)} = > x(n)z™” (2.3.1) 


The above equation and its inverse constitute a pair. However, when the 
inverse of the transformed function is needed, we will use tables. 


Example 2.3.1: Find the z-transform of the function x(n) = 0.9"u(n). 


Solution: The z-transform is given by 
1 


~ 1-0.9z7 
(2.3.2) 


X(z)= dose “=> (0.927)" =1+ 0.927! + (0.922)? +. 


n=0 


where we used the geometric series property (1+ x +x? + ...)=1/(1 — x) if 
|x! <1. For the sequence to be summed, the following inequality must be 
satisfied 


0.9z7<1 or lz|>0.9 


The absolute value of z is the distance from the origin of the complex plane 
to a circle on the same plane with radius |z|. The whole region |z1> 0.9 of 
the z-plane for this example is known as the region of convergence (ROC). 
The ROC is important if we are asked to find the inverse z-transform by 
integration in the complex plane. If we set z = exp(j@) in (2.3.1), we obtain 
the DTFT for the time function x(n). 


There exists a family of z-transforms for which X(z)’s are rational 
functions of z or, equivalently, zt. The roots of the numerator and denom- 
inator of a rational function are known as the zeros and poles, respectively. 


12 
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Table 2.3.1 Properties of the Z-Transform 


Property Time domain Z-domain 
Notation x(n) X(z) 
x(n) X,(Z) 
x(n) X(n) 
Linearity ax,(n)+bx,(n) aX,(z)+bX,(Z) 
Time shifting x{n—k) BX (z) 
Scaling a x(n) X(a-z) 
Conjugation x*(n) X*(z*) 
Differentiation nx(n) —z(dX(z)/dz) 
Convolution x(n)*h(n) X(z)H(z) 
Initial value (if a sequence x(n,) = 2" xo 


is zero for n < no) 


Final value (if x(%) exists) lim x(n) = lim(1 —z")X(z) 


For example, the rational function in the above example has a zero at 0 
and a pole at 0.9. The location of the poles and zeros in the complex plane 
play an important role in the study of discrete signals and systems. Some 
fundamental properties and some pairs of z-transforms are given in Table 2.3.1 
and Table 2.3.2, respectively. 


Table 2.3.2 Z-Transform Pairs 
Time domain Z-domain ROC 





n) 1 all z 

&n—k) Zs all z 
z 

u(n) z ej zi>] 
Z 

nu(n) C-D? izl>1 
bs 

a"u(n) a Izl>a 

2 
— T 
cos(n@l)u(n) seed chose Fal 





z*—2zcoswT +1 
zsin or 
z?’ —2zcos@r +1 





sin(Hol )u(n) Izl>1 


Z 
na"u(n) (z—ay lzl>a 
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2.4 Discrete-time systems 


A discrete time-invariant system is a physical device or an algorithm that 
transforms an input (or excitation) signal {x(n)} into another one, called the 
output signal, {y(n)}. Every discrete system is defined by its response, known 
as the impulse response h(n) of the system, to a unit impulse input &n), 
which is defined as follows: 


, n=0 
O(n) = (2.4.1) 
0 n#0 


The relationship that provides the output of a system, given its input, 
is a mathematical relationship that maps the input to the output. This rela- 
tionship is given by the convolution operation 


y(n) = x(n) * h(n) = y x(m)h(n-— m) = y h(m)x(n— m) (2.4.2) 


m= Hi=—oo 


The above equation indicates the following operations required to find the 
output y(n). (a) We select another domain m for x(n) and A(n). (b) In this 
domain, we represent one of the functions as it was in the n domain by 
simply substituting each n with m. (c) We flip the other function in m domain 
(see the minus sign in front of m) and shift it by n. (d) Next, we first multiply 
the two sequences term by term, as they were arranged by the shifting 
process, and then add the results. (e) The result is the output of the system 
at time n. (f) We repeat the same procedure for all n’s and, thus, we find the 
output for all times from minus infinity to infinity. 


Example 2.4.1: Find the convolution of the following two functions: 
f(n) = u(n) and h(n) = a"u(n), lal< 1. 


Solution: If we flip and shift the unit step function, we observe that when 
n < 0, the two functions do not overlap and, hence, the output is zero. 
Therefore, we find the output only if n is equal to or greater than zero. Hence, 
(2.4.2) for this case takes the form 





y(n) = X un- ma" = "Sa" =|t+ata*+--t+a"= l 


m=0 m=0 


where we used the formula for a finite geometric series. The step function 
u(n — m) is zero for m > n and is equal to one for m <n. 
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The two most important properties of the convolution operation are: 


Linearity: 
9(n) = (f (1) + h(n) * y(n) = f(n) * y(n) + h(n) * y(n) (2.4.3) 

z-transform: 
ZL f(n) *h(n)} = Z{h(n)* f(n) = F()H(2) (2.4.4) 


Transform-domain representation 


Based on the above development, the output y(n) in the z-domain of any 
system having impulse response A(n) and an input x(n) is given by 


Y(z) = H(z)X(z) (2.4.5) 


H(z) is known as the system function and plays a fundamental role in the 
analysis and characterization of LTI systems. If the poles of H(z) are inside 
the unit circle, the system is stable and H(e'®) provides its frequency response. 

A realizable discrete LTI and causal (its impulse response is zero for 
n < 0) system can be represented by the following linear difference equation 


q 
y(n) + Y atony(n —m)= ` b(m)x(n—m) (2.4.6) 


m=0 m=0 


To find the output of a discrete system, we use the MATLAB function 


y=filter(b,a,x);%b=row vector of the b's; a=row vector of the a's; 


$x=row vector of the input {x(n)}; y=output vector; 
Taking the z-transform of (2.4.6), and remembering that the transform 


is with respect to n and that the functions /(.) and x(.) are shifted by n, we 
obtain the following relation 





= -m _ *A\SS _ m = B 
Os Den: xe) $ m TE (2.4.7) 
. + > a(m)z™ 


m=1 


The above system is known as the Autoregressive Moving Average (ARMA) 
system. If a’s are zero, then the system in time and in z-domain is given, 
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respectively, by 


q 
yn) =} b(n) x(n — m) ame 


H(z) = Biz) 


The above system is known as the Finite Impulse Response (FIR) or 
nonrecursive. If the b’s are zero besides the b(0), then the system in time and 
z-domain is given by 


p 


y(n)+ X a(m)x(n~m) = bO)x(n) 





SE ~ AQ) 
1+ 5. a(m)z ” 


The above system is known as the Infinite Impulse Response (IIR) or 
recursive. 

Figure 2.4.1 shows the above three systems in their filter realization and 
of the second order each. In this text we will be mostly concerned with the 
FIR filters, since they are inherently stable. 


x(n- 1} 





b(2)x(n — 2) 


Figure 2.4.1(a) FIR system. 
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~a(i)y(n - 1} 


(b) 


Figure 2.4.1(b) IIR system. 


b(2)x{n — 2) 





(c) 


Figure 2.4,1(c) ARMA system. 
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Problems 
2.2.1 Compare the amplitude spectrum at œ = 1.6 rad/s between the con- 
tinuous function x(t) = exp(- ltl) and DTFT of the same signal with sampling 


interval T = 1s and T = 0.1s. 


2.2.2 Repeat Problem 2.2.1 using the DFT. Assume the range for the DFT 
is 6 <t <6, and use sampling intervals T= 1 and T= 0.2 and ø= 77/2. 


23.1 Find the z-transform of the functions (see Table 2.3.2): (a) cos(n@T)u(n); 
(b) na"u(n). 
2.4.1 The transfer function of a discrete system is H (?")=1/0- e/°T) Find 


its periodicity and sketch its amplitude frequency response for T = 1 and T 
= 1/2. Explain your results. 


at bz + cz? 


2.4.2 Find the type (lowpass, highpass, etc.) of the filter H(z) = 
c+bz" +az" 
Hints-solutions-sug gestions 


2.2.1: 


X(@)= f eNe imap = f eo Mt + f ge 
co —00 0 


oo 











1 ETET: 1 l 
= —__——__le (@-l)t 4 eo Vert 
“jo-1)! ort 0! 
1 
= Er : am eas a, 
-(j@-1) (j@+1) (1+@°) 1+1.6 


= -1 
T= = Xe = TY eit TY eei 


ET eT! p-jonT zapat (g ToT y ET Cae ey i 





E T l-e e 41-e Teel 
+T + —— + — = -T4+T 
te en! eT fs ete ler je ef eet = ete jet 4 et 


2267 ) : 
=_TiT e608 OF _, ¥(eil 1) = 0.7475, X(ei- ©) = 0.5635 


1-2e! coswT +e? 
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Aia 

Use the help of MATLAB for the frequency 2/2. (a) T = 1; n = —6:1:6; x = 
exp(-abs(n)); X = 1*fft(x, 500). The frequency bin is 27/(500 — 1) and hence 
k(27/500) = 2/2, which implies that k = 125 and, hence, abs(X(125)) = 0.7674. 
(b) Similarly, for T = 0.2 we find k = 25 and abs(X(125)) = 0.6194. The exact 
value is equal to 2/(1 + (#/2)*) = 0.5768. 





2.031! 
. ore git ite... 1< . 
t2 2, 2 pa 2 


1 1 1 1 z? —zcos@l 


24g et Daeg! et BOF coswr 1 


peN aa dw. df z z 
b Hai n = nee, eae ee ne a S ee 
) >" i » dz ee) dz & m ZEA (z-a 














2.4.1: 
By setting @=@+(22/T) we find that H (e) is periodic with period 27/2. 
From the plots of | H(e’) we observe that the fold-over frequency is at 7/T. 


2.4.2: 
All-pass 


chapter 3 


Random variables, sequences, 
and stochastic processes 


3.1 Random signals and distributions 


Most signals in practice are not deterministic and can not be described by 
precise mathematical analysis, and therefore we must characterize them in 
probabilistic terms using the tools of statistical analysis. 

A discrete random signal {X(n)} is a sequence of indexed random variables 
(rv’s) assuming the values: 


{x(0), x(1), x(2),....} (3.1.1) 


The random sequence with values {x(n)} is discrete with respect to sampling 
index n. Here we will assume that the random variable at any time n is a 
continuous function, and therefore, it is a continuous rv at any time n. This 
type of sequence is also known as time series. 

A particular rv, X(n), is characterized by its probability density function 


(pdf) f(x(1)) 


_ OF(x(n)) 
f(x(n)) = a) (3.1.2) 


and its cumulative density function (cdf) F(x (n)) 
x(t) 


F(x(n)) = p(X(n) < x(n) = | fly(n))dy(n) (3.1.3) 


—oo 
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p(X(n) < x(n)) is the probability that the rv X(n) will take values less than or 
equal to x(n) at time n. As the value of x(n) goes to infinity, F(x(n)) approaches 
unity. Similarly, the multivariate distributions of rv’s are given by 


F(x(n,),-++, x(n, )) = p(X(n,) < x(n, ),---, X(n) S x(n, )) 


F F(x(n,),--+,x(n,)) (3.1.4) 
x(n, ) +++ x(n, ) 





F (x(n, ),--+ x(n) = 


Note that here we have used a capital letter to indicate rv. In general, we 
shall not keep this notation, since it will be obvious from the context. 

To obtain a formal definition of a discrete-time stochastic process, we con- 
sider an experiment with a finite or infinite number of unpredictable outcomes 
from a sample space, S(Z,, 2, ...), each one occurring with a probability p(z; ). 
Next, by some rule we assign a deterministic sequence x(n, z; ), 9 <n <2, to 
each element z; of the sample space. The sample space, the probabilities of each 
outcome, and the sequences constitute a discrete-time stochastic process or random 
sequence. From this definition we obtain the following four interpretations: 


x(n,z) is an rv if n is fixed and z is variable. 

x(n,z) is a sample sequence called realization if z is fixed and n is variable. 
x(n,z} is a number if both n and z are fixed. 

x(n,z) is a stochastic process if both n and z are variables. 


Each time we run an experiment under identical conditions, we create 
a sequence of rv’s {X()}, which is known as a realization and constitutes an 
event. A realization is one member of a set called the ensemble of all possible 
results from the repetition of an experiment. 


Book MATLAB script file 


To create a script file, we first create the file, as shown below, as a new 
MATLAB file. Then we save the file named ‘realizations.m’, for example, in 
the directory c:\aamatlab. When we are in the MATLAB command window, 
we attach the above directory to MATLAB path using the following com- 
mand: path(path,’c:\aamatlab’). Then the only thing we have to do is to 
write: realizations and automatically the MATLAB will produce Figure 3.1.1, 
which shows four realizations of a stochastic process with zero mean value. 


$script file: realizations 

for n=1:4 

x(n, :)=rand(1,50)-0.5;%produces matrix x with 4 rows 
Sand 50 columns of zero mean white noise; 

end; 

m=0:49; 

for 1=1:4 

subplot (4,1,1);stem(m,x(1,:),'k');%plots four rows of matrix x; 
end; 
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Four realizations 
Q) OO O 
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Figure 3.1.1 Four realizations of zero mean white noise. 


Stationary and ergodic processes 


It is seldom in practice that we will be able to create an ensemble of a random 
process with numerous realizations so that we can find some of its statistical 
characteristics, e.g., mean value, variance, etc. To find these statistical quan- 
tities we need the pdf of the process, which, most of the time, is not possible 
to produce. Therefore, we will restrict our studies to processes that are easy 
to study and easy to handle mathematically. 

The process that produces an ensemble of realizations and whose sta- 
tistical characteristics do not change with time is called stationary. For exam- 
ple, the pdfs of the rv’s x(n) and x(n + k) of the process {x(n)} are the same 
independently of the values of n and k. 

Since we are unable to produce ensemble averages in practice, we are left 
with only one realization of the stochastic process. To overcome this difficulty 
we assume that the process is ergodic. This characterization permits us to find 
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the desired statistical characteristics of the process from only one realization 
at hand. We refer to those statistical values as sample mean, sample variance, etc. 


3.2 Averages 

Mean value 

The mean value or expectation value yt, at time n of a random variable x(n) 
having a pdf f(x(7)) is given by 


= Ele = | x0) fdln) (3.2.1) 


—co 


where E{-} stands for expectation operator. We can also use the ensemble of 
realizations to obtain the mean value using the frequency interpretation formula 


N 


1 T — 
= mli $o N = number of realizations (3.2.2) 


i=1 


where x(n) is the tt outcome at sample index n (or time n) of the 7 realiza- 
tion. Depending on the type of the rv, the mean value may or may not vary 
with time. 

For an ergodic process, we find the sample mean (estimator of the mean) 
using the time-average formula 


ù= — xn) 625) 


It turns out (see Problem 3.1.2) that the sample mean fi is equal to the 
population mean and, therefore, we call the sample mean an unbiased estimator. 


Correlation 
The cross-correlation between two random sequences is defined by 

r 0m n) = Elx) yn) = Sf x(m)yin) fan), yodxmdyo 6.24 
where the integrals are from minus infinity to infinity. If x(n) = y(n), the 
correlation is known as the autocorrelation. Having an ensemble of realiza- 


tions, the frequency interpretation of the autocorrelation function is found 
using the formula 


? 
r (m,n) = lim Eda (3.2.5) 


Note that we use one subscript for autocorrelation functions. In case we have 
cross-correlation we will use both subscripts. 
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Example 3.2.1: Using Figure 3.1.1, find the mean for n = 15 and the 
autocorrelation function for time difference of five, n = 20 and n = 25. 


Solution: The desired values are 


4 
1 ») 1 
Ls = 4 x (15) = MI +0.14+0.2- 0.1) = 0.05 


r (20,25) = ye (20)x,(25) = lt 0.4)(0.2) + (— 0.1) (0.45) + (0.25)(0.2) 
+(—0.45)(-0.3)] = 0.0150 


Because the number of realizations is very small, both values found above 
are not expected to be accurate. 


Figure 3.2.1 shows the mean value at 50 individual times and the 
autocorrelation function for 50 differences (from 0 to 49) known as lags. 





Autoc. for 10 realizations Autoc. for 400 realizations 


Figure 3.2.1 Means and autocorrelations based on frequency interpretation. 
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These results were found using the MATLAB function given below. Note 
that as the number of realizations increases, the mean tends to zero and the 
autocorrelation tends to a delta function, as it should be, since the random 
variables are independent (white noise). 


Book MATLAB function to find the mean and autocorrelation function using 
the frequency interpretation formula: 


function [mx,rx]=aameanautocensemble (M,N) 

$function[{mx, rx]=aameanautocensemble (M,N) ; 

we create an MxN matrix;M=number of realizations; 
%N=number of time slots;easily modified for other types 
%of pdf's; 

x=randn (M,N) ;%randn=gives Gaussian distributed white noise 
mx=sum(x,1)/M;%with zero mean;sum(x,1)=sums all the rows; 
for i=1:N %sum(x,2)=sums all the columns; 
rx(i)=sum({x(:,1).*x(:,ił})})/M; 

end; 


To find the sample autocorrelation function from one realization we can 
use the formula 


N-|ml-1 
` x(n)x(n+ m) m=0,1,---,N-1 (3.2.6) 


n=0 


= Ne 


The absolute value of m ensures the symmetry of the sample autocorrelation 
function at n = 0. Although this formula gives an unbiased autocorrelation 
function, it sometimes produces autocorrelation matrices (discussed below), 
which do not have inverses. Therefore, it is customary in practice to use any 
one of the biased formulas 





N-|m|-1 
es > x(n)x(n + |m) O<m<N-1 
or (3.2.7) 
1 N-1 
Fon) = 3 Deen) O<msN-1 


Book MATLAB function to find the unbiased autocorrelation function: 


function[r]=aasampleunbisedautoc (x, lg) 
S$this function finds the unbiased autocorrelation function 
from 0 to lag lg;it is recommended that lg is about 20-30% of N; 


Chapter 3: Random variables, sequences, and stochastic processes 25 


Ne=length (x) ;x=data; 
for m=l:lg 

for n=1:N+1-m 

xs (m,n)=x(n-1i+m) ; 
end; 

end; 

ri=xs*x'; 

for m=1:lg 

den (m) =N+1~m; 
end; 

r=r1'./den; 


Book MATLAB function to find the biased autocorrelation function: 


function[r]=aasamplebiasedautoc(x,1g) 

this function finds the biased autocorrelation function 
with lag from 0 to lg; it is recommended that lg is 20-30% of 
SN; 

N=length(x) ;%x=data;lg=lag; 

for m=l:lg 

for n=1:N+1-m 

xs (m,n)=x(n-1i+m) ; 


end; 

end; 
ri=xs*x'; 
r=r1'./N 


We can also use MATLAB function to obtain the biased or unbiased 
sample autocorrelation and cross-correlation. The function is: 


r=xcorr (x,y, ‘biased’); % for the biased cased, and 
r=xcorr (x, y, unbiased’): % for the unbiased case. 
x,y are N length vectors; r is a 2N-1 symmetric 
cross-correlation vector; 

in case the vectors do not have the same length, the 


shorter one will be zero-padded; 


co oO 


oe GO 


Note: If none of the options (i.e., biased or unbiased) is used, the default value 
is biased and the result will not be divided by N. 

The reader is encouraged to find several interesting options in using the 
xcorr command by writing help xcorr or doc xcorr on the MATLAB com- 
mand window. 


Covariance 


The covariance of a random sequence is defined by 


c (m,n) = E{(x(m) = 1, (x(n) - u} mat 


= E{x(m)x(n)}— u u, =r (m,n)- HH, 
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The variance is found by setting m = n in (3.2.8). Hence, 


c (n,n) = 0? = El(x(n)— u, Y f= Eien- we? 


If the mean value is zero, then the variance and the correlation function are 
identical. 


c (n,n) = 07 = E{x*(n)| =r (n,n) (3.2.9) 


Independent and uncorrelated ru's 


If the joint pdf of two rv’s can be separated into two pdf’s, f (m) J m) 
Í y(t, n)= f (m) f0) , then the rv’s are statistically independent. Hence, | 


E {x(m)x(n)} =F {x(m)}E {x(n)} =u H, (3.2.10) 


The above equation is a necessary and sufficient condition for the two 
random variables x(m), x(n) to be uncorrelated. Note that independent 
random variables are always uncorrelated. However, the converse is not 
necessarily true. 

If the mean value of any two uncorrelated rv’s is zero, then the random 
variables are called orthogonal. In general, two rv’s are called orthogonal if 
their correlation is zero. 


3.3 Stationary processes 


For a wide-sense (or weakly) stationary process, the cdf satisfies the relationship 
F(x(m), x(n)) = F(x(m + k), x(n + k)) (3.3.1) 


for any m, n, and k. If the above relationship is true for any number of rv’s 
of the time series, then the process is known as strictly stationary process. 

The basic properties of a wide-sense stationary process are (see 
Problem 3.3.1): 


u (n) = u = constant r (k)= r (-k) 


r (m,n)=r (m-n) r (0) >r (k) (3.3.2) 
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Autocorrelation matrix 


If x = [x(0) x(1) .... x(p)]" is a vector representing a finite random sequence 
then the autocorrelation matrix is given by 


E{x(O)x(0)} Efx xD} --- E{x(0)x(p)} 
| Etx()x(0)} EDOD) o Exp) 


Efx(p)x(0)} E{x(p)x(1))-- E{x(p)x(p)] 
TAO 1D 1p) 


r r (0) tops) 


(3.3.3) 


r(p) r p= r0) 


Example 3.3.1: Find: (a) the unbiased autocorrelation with lag 20 of a 40 
term sequence; (b) the biased autocorrelation with the same settings; and (c) 
a 4 x 4 autocorrelation matrix. Use a sequence of rv’s having Gaussian 
distribution and zero mean value. 


Solution: We use the MATLAB function x = randn(1,40) to create the 
sequence {x(n)} of 40 terms, which are white rv’s and Gaussian distributed. 
Then we use the two MATLAB functions, which were given above, 
ru=aasampleunbiasedautoc(x,20) and rb=aasamplebiasedautoc(x,20) to pro- 
duce the results shown in Figure 3.3.1. To create the 4 x 4 autocorrelation 
matrix we use the following MATLAB function: R= toeplitz (rb(1,1:4), 
rb(1,1:4));. 


0.8613 -0.0363 -0.0232 0.2112 
~0.0363 0.8613 -0.0363 -0.0232 
R = (3.3.4) 
~0.0232 -0.0363 0.8613 -0.0363 
0.2112 -0.0232 -0.0363 0.8613 


Note the symmetry of the matrix along the diagonals. This type of matrices 
is known as Toeplitz. 


Note: The output vector rb is a row vector, and the parenthesis (1,1:4) 
instructs MATLAB to take the fist row of rb and the first four columns of 
that row. 
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Figure 3.3.1 Illustration of Example 3.3.1. 


Note: If we have a row vector x and need to create a row vector y with 
elements of x from k to m only, we write 


y=x(1,k:m); 


If x is a column vector, and we need to find a column vector y with 
elements of x from k to m only, we write 


y=x(k:m, 1); 


Example 3.3.2: Let {v(n)} be a zero mean, uncorrelated Gaussian ran- 
dom sequence with variance o°(n)= 0" = constant. 


a. Characterize the random sequence {v(n)}. 

b. Determine the mean and the autocorrelation of the sequence 
{x(n)} if x(n) = v(n) + av(n — 1), in the range œ <n <œ where a is 
a constant. 


Solution: (a) the variance of {v(n)} is constant and, hence, is independent 
of the time, 7. Since {v(n)} is an uncorrelated sequence, it is also independent 
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due to the fact that it is Gaussian sequence. From (3.2.8) we obtain 
c (m,n)=r (m, n)- u, u, =r, (m,n) or & = r,(n,n) = constant. Hence, r,(m,n) = 
o2h(m -— n), which implies that {v(n)} is a WSS process. (b) E{x(n)} = 0 since 
E{v(n)} = E{v(n—-1)} = 0. Hence, 


r (m,n) = Efon) + av(m — DJen) + a(n- DI} = Efo(m)o(n)} 
+aE{v(m—1)o(n)} + aE{o(m)o(n - 2} + a Efo(m — Vo(n-1)} = r (m,n) 
+ar(m—1,n)+ar,(m,n-1)+a°r,(m—1,n-1) = 0°d(m—n)+a0°6(m-n+1) 
+80 S(m—n) = (1+.a7)o75(1) + a075(1— 1) + ao’ 5(1 +1) 


Since the mean of the process {x(1)} is zero (constant), and its autocorrelation 
is a function of the lag factor I = m — n, it is a WSS process. 


3.4 Special random signals and probability 
density functions 
White Noise 


A WSS discrete random sequence that satisfies the relation 


Flx(0), x(1), ... ) =flx(0))f(x(1))... (3.4.1) 


is a pure random sequence whose elements x(n) are statistically independent 
and identically distributed (iid). Therefore, the zero mean iid sequence has 
the following correlation function 


r (m—n) = E{x(m)x(n)} = 0-6(m— n) (3.4.2) 


where o? is the variance of the signal and &m — n) is the discrete-time delta 
function. For m#n, &m —n) =0 and, hence, (3.4.2) becomes 


o7d6(k) k=0 
r= * (3.4.3) 
0 k #0 


For example, a random process consisting of a sequence of uncorrelated 
Gaussian rv’s is a white noise process referred to as white Gaussian noise 
(WGN). 
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Gaussian processes 


The pdf of a Gaussian rv x(n) at time n is given by 


fom g a E N o (3.4.4) 


i 
210 
Example 3.4.1: Find the joint pdf of a sequence of WGN with n elements; 
each one having zero mean value and the same variance. 


Solution: The joint pdf is 


FD x(2),--+, x(n) = f(x) fx) fF (en) 
(3.4.5) 


x 


1 15., 
= nf H ap- x J 
(2a 0" 20° 4m 


A discrete-time random process {x(n)} is said to be Gaussian if every 
finite collection of samples of x(n) are jointly Gaussian. A Gaussian random 
process has the following properties. (a) It is completely defined by its mean 
vector and covariance matrix. (b) Any linear operation on the time variables 
produces another Gaussian random process. (c) All higher moments can be 
expressed by the first and second moments of the distribution (mean, cova- 
riance). (d) White noise is necessarily generated by iid samples (indepen- 
dence implies uncorrelated rv’s and vice versa). 

To produce a WGN with zero mean and unit variance, the following 
MATLAB function can be used: 


x=randn(1i1,N);%x is a row vector with N elements of WGN type 
% with zero mean and unit variance; 


In case it is desired to change the mean and the variance, we use the 
following transformation of the vector x. 


z=a*x+m; % the variance of z equals a’, and its mean equals m; 


Exponential distribution 


eri) Osx<0, b>0 


f(x)=40 (3.4.6) 
0 otherwise 
Algorithm 
1. Generate u from a uniform distribution (0,1) 
2. x = -—bln(bu) 


3. keep x 
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Book MATLAB function 


function[x,m, sd]=aaexponentialpdf (b,N) 
gfunction[x,m,sd]=aaexponentialpdf (b,N) 


for 1i=1:N 

x(i)=-b*log(rand); log is MATLAB function that gives the 
Snatural algorithm; 

end; 


m=mean(x); *mean(x)=MATLAB function providing the mean value; 
sd=std(x); %std(x)=MATLAB function providing the standard 
deviation which is the square root of the variance; 


Normal distribution 





(x- ee 
x)= ex 3.4.7 
fa) = p T p|- T (3.4.7) 
Algorithm 
1. Generate two independent rv’s u; and u, from uniform distribution 
(0,1) 


2. x = (-2In(u,))'/*cos(2mu,) (or x, = (-2Inu,)!/*sin(27u,)) 
3. Keep x, or x, 


Book MATLAB function 
function[x]=aanormalpdf (m,s,N) 
%$function[x]=aanormalpdf (m,s,N); 
$s=standard deviation;m=mean value; 
for 1=12N 
rl=rand; 
r2=rand; 

Zilli) =sert (-2* log (r1).) =cos (274 p1* 72) 5 
end; 
x=s*z+m; 


Lognormal distribution 
Let the rv x be N(u,0*). Then y = exp(x) has the lognormal distribution with pdf 
Ne 
ee (eee TEET 
fly)=4 V2n0y 20 (3.4.8) 
0 otherwise 


The values of 6 and u must take small values to form a lognormal-type 
distribution. 
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Algorithm 
1. Generate z from N(0,1) 
2. x=u+0z (xis N(ji,o7)) 


3. y = exp(x) 
4. Keep y 
Book MATLAB function 


function [y]=aalognormalpdf (m, s,N) 
$function[y]=aalognormalpdf(m,s,N) ; 

m=mean value;s=standard deviation;N=number of samples; 
for i=1:N 

ri=rand; 

r2=rand; 

z(i}=sqrt(-2*log(r1}))*žcos(2*pi*r2); 

end; 

x=m+s* zZ; 

y=exp (x) ; 


Chi-Square distribution 
If z Zz are N(Q, 1), then 


y= Dae (3.4.9) 


i=l 


has the chi-squared distribution with k degrees of freedom and is denoted by 7°(k). 
To produce graphically the pdf using any of the above functions, we 
may use the following MATLAB function: 


hist(x,b);% x is a vector consisting of the values of the rv's; b is 
% the number of bins required in finding the distribution of x; 


3.5 Wiener—Khintchin relations 


For a WSS process, the correlation function asymptotically goes to zero and, 
therefore, we can find its spectrum using the discrete-time Fourier transform. 
Hence, the power spectrum is given by 


oo 


S (el) = Ye (3.5.1) 


k=—0co 


This function is periodic with period 27 (exp(—jk(@ + 27)) = exp(—jkq@)). Given 
the power spectral density, the autocorrelation sequence is given by the relation 


r (k)= ia f S (e de (3.5.2) 
BN DI xX 
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For real process, r,(k) = r,(—k) (symmetric function) and as a consequence the 
power spectrum is even function. Furthermore, the power spectrum of WSS 

rocess is nonnegative. These two assertions are given below in the form of 
mathematical relations. 


S (e)=S (e) = Se") 
| (3.5.3) 
S (el) 20 


Example 3.5.1: Find the power spectra density of the sequence x(n) = 
sin(0.1*2*pi*n) + 1.5*randn(1, 32) withn=[0 1 2... 31]. 


Solution: The following Book MATLAB program produced Figure 3.5.1. 


n=0:31; 

s=zsin(0.1*2*pi*n); 
v=randn(1,32);%Gaussian white noise; 
X=StV; 












E Y 
Krneza 


0 10 20 30 40 
Freq. bins 






Figure 3.5.1 Illustration of Example 3.5.1. 


34 Adaptive filtering primer with MATLAB 


r=xcorr(x, 'biased’); %the biased autocorrelation is divided 
by N, here by 32; 

fs=fft (s); 

fr=fft(r,32); 

subplot (3,2,1);stem(n,s,'k’);xlabel 


neke yLlabel tsn] ")s 
subplot (3,2,2);stem(n,v,'k’);xlabel (* 

P 

=) 


n 

in) ylaped Atm) 3 
n’);ylabel(‘*x(n)’); 
>xlabel(‘k, time ... 


subplot(3,2,3);stem(n,x,’k’);xlabel 

subplot (3,2,4);stem(n,r(1,32:63), 
lags’) ;ylabel(‘r(k)’); 

subplot (3,2,5);stem(n,abs(fs),‘'k’);xlabel(‘fregq. bins’)... 
;ylabel (‘S_s(e*{j\omega}’); 

subplot (3,2,6);stem(n,abs(fr),’k’);xlabel(‘freq. bins’); 
ylabel(*S_x{e*{j\omega}’); 


3.6 Filtering random processes 


Linear time-invariant filters are used in many signal processing applications. 
Since the input signals of these filters are usually random processes, we need to 
determine how the statistics of these signals are modified as a result of filtering. 

Let x(n), y(n), and h(n) be the filter input, filter output, and the filter 
impulse response, respectively. It can be shown (see Problem 3.6.1) that if 
x(n) is a WSS process, then the filter output autocorrelation r (5) is related 
to the filter input autocorrelation r,(k) as follows. 


r (k)= y y h(I)r (m—1+ kyh(m) = r (k)* h(k) * h(k) (3.6.1) 


l = -œ m = =% 


The right-hand expression of (3.6.1) shows convolution of three functions. 
We can take the convolution of two of the functions, and the resulting 
function is then convolved with the third function. The results are indepen- 
dent of the order we operate on the functions. 

From Table 2.3.1, we know that the z-transform of the convolution of 
two functions is equal to the product of their z-transforms. Remembering 
the definition of the z-transform, we find the relationship (the order of 
summation does not change the results) 


Z{h(-k)} = is kz = Y ime” = H(z”) (3.6.2) 


HI=oa 


Therefore, the z-transform of (3.6.1) becomes 


R (2) = Z4r (K) * AZAK) = R, (2)H(@)H(z") (3.6.3) 
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If we set z = e}? in the definition of the z-transform of a function, we find the 
spectrum of the function. Having in mind the Wiener-Khintchin theorem, 
(3.6.3) becomes 


S (e°) = S (eP H] (3.6.4) 


The above equation shows that the power spectrum of the output random 
sequence is equal to the power spectrum of the input sequence multiplied 
by the square of the absolute value of the spectrum of the filter transfer 
function. 


Example 3.6.1: An FIR filter is defined in the time domain by the differ- 
ence equation: y(n) = x(n) + 0.5x(n — 1). If the input signal is a white Gaussian 
noise, find the power spectrum of the output of the filter. 


Solution: The z-transform of the difference equation is Y(z) = (1 + 0.5z7)X(z) 
(see Chapter 2). Since the ratio of the output to input is the transfer function 
of the filter, the transformed equation gives H(z) = Y(z)/X(z) = 1 + 0.5z+. The 
absolute value square of the spectrum of the transfer function is given by 


. be : f z ; 
He”) = (1+ 0.5e\14 0.5e) =14+ 0.5(e +e) + 0.25 =1.25+cosm 
(3.6.5) 
where the Euler identity e”? = cos@+jsin@ was used. Figure 3.6.1 shows 
the sequence x{n}, the autocorrelation function {n}, and the power spec- 


trums of the filter input and its output, S,(@) and 5S,(o), respectively. Note 
that the spectrums are symmetric around w= 7. 


tjo _ 


Spectral factorization 


The power spectral density S e") of a WSS process {x(n)} is a real-valued, 
positive and periodic function of @. It can be shown that this function can 
be factored in the form 


S (e) = o Q(z)Q(z") (3.6.6) 


where 
1. Any WSS process {x(n)} may be realized as the output of a causal 
and stable filter A(n) that is driven by white noise v(m) having variance 
o}. This is known as the innovations representation of the process. 
2. If x(n) i is filtered by the filter 1/H(z) (whitening filter), the output is a 
white noise v(m) having variance o-. This process is known as the 
innovations process and is given by 


[o? H(z)H(z")] ToS -= 0? (3.6.7) 
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Figure 3.6.1 Illustration of Example 3.6.1. 


where the first bracket is the input power spectrum, 5$,(z), and the 
second represents the filter power spectrum. 

3. Since v(m) and x(n) are related by inverse transformations, one process 
can be derived from the other and they contain the same information. 


3.7 Special types of random processes 
Autoregressive moving average process (ARMA) 


A stable shift-invariant system with p poles and q zeros can be represented 
in the general form 


Ye" 


H(z)= ra = 0 o (3.7.1) 


1+ dy ae" 


k=1 
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If a white noise, v(n), with variance o? is the input to the above filter, the 
output process x(n) is a WSS process and its power spectrum is given by 
(see (3.6.6)) 


2 B(z)B(z*) 








IE AVA) 
or (3.7.2) 
Be 
5 (e) = o; 2 
Alee) 





This process, which has the above power spectrum density, is known as the 
autoregressive moving average process of order (p, q). 

It can be shown that the following correlation relationship exists for this 
process: 


p Za 

r (k)+ Daten -m)= i j l ; <4 (3.7.3) 
q g-k 

c(k) = X b(myntn _k)= > oon + k)h(m) (3.7.4) 


These equations are known as the Yule—Walker equations and can be written 
in the following matrix form for k = 0,1, ...,p + q: 


r(0) rh - r (=p) (0) 
PAJ FAQ) F rPI c(1) 
. li . 
a(1) 
Ap = | 
r€) r (4-1) aCe Nel p 08 ere 
r(q+l1) r(q) r (q-p+1) 0 
a(p) 


rtp) wigep+ l= r (q) 
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It is apparent from the above equation that knowing the autocorrelation of 
a process produced by an ARMA model, we may be able to find the coeffi- 
cients of the process. This task for the ARMA model is rather difficult due 
to nonlinear terms that appear in (3.7.4). 


Autoregressive process (AR) 


A special and important type of the ARMA process results when we set gq = 0. 
Hence, from (3.7.1), (3.7.2), (3.7.3), and (3.7.4) we obtain the relations: 


H(z)= i (0) (3.7.6) 
1+ ` a(k)z ~“ 
Se") =e. 20; (3.7.7) 
jace") 

r (k)+ Y amy (k - m) = 02b(0)’ 5(k) k20 (3.7.8) 

r (0) LED as ZEP 1 1 

r (1) r (0) ae PP ED fal 0 
= 07b(0)"| | (3.7.9) 

r (p) ripa Tp = - £0) ap) 0 


Example 3.7.1: Find the AR coefficients a(1)-a(5) if the autocorrelation 
of the observed signal {x(n)} is given. Assume the noise variance is equal to 
one. Find the power spectrum of the process. 


Solution: First we produce a WSS process {x(n)} by passing a white noise 
through a linear time-invariant filter. In this example we use a second order 
AR filter: x(n) ~ 0.9x(n — 1) + 0.5x(n — 2) = v(n). The variance of the white 
noise input to the filter, v(7), is one, and its mean value is zero. The results 
are shown in Figure 3.7.1, where S(@) =|H(e/®)I2. The Book MATLAB pro- 
gram used to produce these results, and the AR coefficients is given below. 


$Example 3.7.1 
g=3.5* (rand(1,500)-0.5); % g: uniformly distributed WGN 
of zero mean and unit variance. 


oe 
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Figure 3.7.1 Illustration of Example 3.7.1. 


x=filter(1,[1 -0.9 0.5],g); % x: observed signal 
[rx, lags]=xcorr (x, x, 15, 'biased'); % rxl=autocorrelation of x. 
R=toeplitz(rx(1,1:6)); % we create the 6x6 autocorrelation 


Q 


% matrix; 

Rl=inverse of R. 

find b(0) squared. 

find the AR coefficients: 
a({1) ... a(5); 

H is the frequency response 
of the filter. 


de 


Rl=inv (R); 
b2=1/R1 (1,1); bO=sqrt (b2); 
a=b2*R1 (2:6,1) 


de 


ae 


H=b0./ (£ft([1 a'],512}); 


de dP oe 


nn=0:511; w=2*pi*nn/512; 

S=(H.*conj (H))/512; subplot (222); plot (w,S) ;xlabel('\omega'); ... 
ylabel('S(w)'); 

subplot (221); n=0:49; stem(n,x(1:50),'filled');xlabel 
("n"); ylabel('x[n]"); 

subplot (223); stem(lags,rx, 'filled');xlabel('Lags');... 
ylabel('rx[n]'"); 

subplot (224); X1l=fft(x,512); Sx=X1.*conj (X1)/512;... 
plot (w, Sx); 

xlabel ('\omega');ylabel('Sx(w)'); 
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Moving average process (MA) 


The autocorrelation coefficient r,(k) depends nonlinearly on the filter coeffi- 
cients b(k), and as a consequence, estimating the coefficients is a non-trivial 
problem. 


3.8 Nonparametric spectra estimation 


The spectra estimation problem in practice is based on finite-length record 
{x(1), ..., x(N)} of a second-order stationary random process. However, har- 
monic processes having line spectra appear in applications either alone or 
combined with noise. 


Periodogram 


The periodogram spectral estimator is based on the following formula: 


2 


z Serf (3.8.1) 


N-1 


bI x(nje T” 


n=0 


E: 
JON _ _ 
MEN 








where S (e) is periodic with period 27r, -z < œ < z, and X(e"”) is the DFT 
of x(n) (see Section 1.5). The periodicity is simply shown by introducing w+ 27 
in place of @ in (3.8.1) and remembering that exp(j2”) = 1. 


Correlogram 


The correlogram spectral estimator is based on the formula 

E N-1 

ô (e?) = > Pme" (3.8.2) 

m=-(N-1) 

where f(m) is the estimate of the correlation (assumed zero mean value of 
{x(n)}) given by (3.2.6). It can be shown that the correlogram spectral 
estimator evaluated using the standard biased autocorrelation estimates 
coincides with that of the periodogram spectral estimator. As in (3.8.1), the 
correlogram is periodic function of @ with period 27. 


Computation of 5 (e) and 8 (e) using FFT 


Since both functions are continuous functions of @, we can sample the frequency 
as follows: 


Dek EAN (3.8.3) 
N 


This situation reduces (3.8.1) and (3.8.2) to finding the DFT at those frequencies: 


N-1 N-} 


X(el) = Y anye” we Y anw", OSKSN-1 (3.8.4) 


n=0 n= 
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and thus, 


2 X ; siat. ER ki 
; Se) = 5 Pane N (3.8.5) 


m=-(N-1) 


C (2k e S ja, 
ê (e G ) 





The most efficient way to find the DFT using FFT is to set N = 2” for some 
integer r. The following two MATLAB functions give the windowed peri- 
odogram and correlogram, respectively. 


Book MATLAB function for the periodogram 


function[s]=aaperiodogram(x,w,L) 

%$function[s]=aaperiodogram(x,w,L) 

Sw=window (@name, length(x)), (name=hamming, kaiser, hann, rectwin, 

sbartlett, tukeywin, blackman, gausswin,nattallwin,triang, 
S$blackmanharris) ; 

SL=desired number of points (bins) of the spectrum; 

$x=data in row form;s=complex form of the DFT; 

XW=X. *w' 

for m=1:L 

n=1:length (x); 

s (m)=sum (xw. *exp (-j* (m-1)* (2*pi/L) *n)); 

end; 

$as=((abs(s)).*2/length(x))/norm(w)=amplitude spectral 
density; 

Sps=(atan(imag(s)./real(s))/length({x))/norm(w)=phase spectrum; 


To plot as or ps we can use the command: plot(0:2*pi/L:2*pi-(2*pi/L),as). 


Book MATLAB function for the correlogram 


function[s]=aacorrelogram(x,w,lg,L) 

$function[s]=aacorrelogram(x,w,lg,L); 

$x=data with mean zero;w=window(@name,length(2*lg)), see 
S$aaperiodogram 

$function and below this function) ;L=desired number of 
$spectral points; 

$lg=lag number<<N;rc=symmetric autocorrelation function; 

r=aasamplebiasedautoc(x,lg); 

palo) istic toh bree Gag: larg? ioe Wop aero oma OIE 

rew=rc.*w'; 

for m=1:L 

n=-lg+l1:lg; . 

s (m)=sum(rew. *exp (-j* (m-1) * (2*pi/L) *n)); 

end; 

Sasc=(abs(s).*2)/norm(w)=amplitude spectrum; 

spsc=(atan(imag(s))/real(s))/norm(w)=phase spectrum; 
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General Remarks on the Periodogram 

1. The variance of the periodogram does not tend to zero as N=. This 
indicates that the periodogram is an inconsistent estimator; that is, its 
distribution does not tend to cluster more closely around the true 
spectrum as N increases. 

2. To reduce the variance and, thus, produce a smoother spectral esti- 
mator we must: a) average contiguous values of the periodogram, or 
b) average periodogram obtained from multiple data segments. 

3. The effect of the sidelobes of the windows on the estimated spectrum 
consists of transferring power from strong bands to less strong bands 
or bands with no power. This process is known as the leakage problem. 


Blackman—Tukey (BT) method 


Because the correlation function at its extreme lag values is not reliable due 
to the small overlapping of the correlation process, it is recommended to use 
lag values about 30-40% of the total length of the data. The Blackman-Tukey 
estimator is a windowed correlogram and is given by 


Sar (e”) = 5 wm Pme (3.8.6) 


m=-(L-1) 


where w(m) is the window with zero values for Iml> L-1 and L «< N. The 
above equation can also be written in the form 


Sgr(e®)= 2. w(m) r(m)e 1" 


(3.8.7) 
= S.(e”) * Wel”) = Ea Í Se (e7 Wel” )dt 
20 

-m 
where we applied the DTFT frequency convolution property (the DTFT of 
the multiplication of two functions is equal to the convolution of their 
Fourier transforms). Since windows have a dominant and relatively strong 
main lob, the BT estimator corresponds to a “locally” weighting average 
of the periodogram. Although the convolution smoothes the periodogram, 
it reduces resolution in the same time. It is expected that the smaller the 
L, the larger the reduction in variance and the lower the resolution. It turns 
out that the resolution of this spectral estimator is on the order of 1/L, 

whereas its variance is on the order of L/N. 

For convenience, we give some of the most common windows below. 
For the Kaiser window the parameter f trades the main lobe width for the 
sidelobe leakage; B = 0 corresponds to a rectangular window, and p> 0 
produces lower sidelobe at the expense of a broader main lobe. 


Rectangle window 
w(n) = 1 nan Dörrar tee 
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Bartlett (triangle) window 


n 





— n=O le Ly 
L/2 

w(n) = 
ae n=—+],.., L-1 
E2 


Hann window 
2n 
v40)=03| 1 -cos{ 2) n=% be b=] 
Hamming window 
27 
w(n) = 0.54 — 0.46 cos z n n=0,1,...,L—-1 


Blackman window 


43 


w(n) = 0.42 +0.5cos 2, ~ £) + 0.08 cos > (n — J n=1,2,...,L—1 


Kaiser window 





wn) = —(L-1)sn<L-1 





k 
o 
=| \ 2 m , 
Ex) = = zero-order modified Bessel function 
0 k! 
k=0 i 


w(k)= 0 for kl > L, w(k) = w(—k) and equations are valid for0 <k < L-1 


Note: To use the window derived from MATLAB we must write 


w=window (@name, L) 


name=the name of any of the following windows: Bartlett, 
barthannwin, blackman, blackmanharris, bohmanwin, chebwin, 


gausswin, hanning, hann, kaiser, natullwin, 
rectwin, tukeywin, triang. 
L=number window values 
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Figure 3.8.1 Bartlett method of spectra estimation. 


Bartlett method 


Bartlett’s method reduces the fluctuation of the periodogram by splitting up 
the available data of N observations into K = N/L subsections of L observa- 
tions each, and then averaging the spectral densities of all K periodograms 
(see Figure 3.8.1). The MATLAB function below provides the Bartlett peri- 
odogram. 


Book MATLAB function for the Bartlett method 


function[as,ps,s]=aabartlettpsd(x,k,w,L) 

$x=data;k=number of sections; w=window 
%(@name, floor (length(x)/k)); 

$L=number of points desired in the FT domain; 

$K=number of points in each section; 

K=floor(length(x) /k) ; 

s=0; 

ns=1; 

for m=1:k 

s=staaperiodogram(x(ns:ns+K-1),w,L)/k; 

ns=ns-+tkK ; 

end; 

as=(abs(s)) .*2/k; 

ps=atan(imag(s)./real(s))/k; 
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Welch method 


Welch proposed modifications to Bartlett method as follows: data segments 
are allowed to overlap and each segment is windowed prior to computing 
the periodogram. Since, in most practical applications, only a single realiza- 
tion is available, we create smaller sections as follows: 


x(n) = x(iD + n)w(n) O<n<M-1, O<i<K-1 (3.8.8) 


where w(n) is the window of length M, D is an offset distance and K is the 
number of sections that the sequence {x(n)} is divided into. Pictorially the 
Welch method is shown in Figure 3.8.2. 

The i periodogram is given by 


L-1 2 


> 1 | 
GO Ve: —jon 
S,(e”) == DHE (3.8.9) 
and the average periodogram is given by 
1 K-I 
joy > jo 
S(e*)= 5 doe ) (3.8.10) 
0 N-1 

















+ 


Periodogram 2 


+ 










Periodogram K 


Total/K Averaging 


PSD estimate 





Figure 3.8.2 Welch method of spectra estimation. 
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If D = L, then the segments do not overlap and the result is equivalent to 
the Bartlett method with the exception of the data being windowed. 


Book MATLAB function for the Welch method 


function{as,ps,s,K]=aawelch(x,w,D,L) 

6function[as,ps,s,K]=aawelch(x,w,D,L); 

$M=length(w)=section length; 

$L=number of samples desired in the frequency domain; 

$w=window(@name,length of sample=length(w) ) ;x=data; 

%D=offset distance=fraction of length(w),mostly 50% of 
3M ;M<<N=length (x); 

N=length (x); 

M=length (w) ; 

K=floor( (N-M+D)/D);%K=number of processings; 

s=0; 

for i=1:K 

s=s+aaperiodogram(x(1, (1-1) *D+1: (1-1) *D4+M) ,w,L); 

end; 

as=(abs(s)).*%2/(M*K) ;%as=amplitude spectral density; 

ps=atan(imag(s)./real(s))/(M*K);%phase spectral density; 


The MATLAB function is given as follows: 


P=spectrum(x,m)%x=data; m=number of points of each section 
Sand must be a power of 2;the sections are windowed by a 
%a hanning window;P is a (m/2)x2 matrix whose first column 
is the power spectral density and the second is 

the 95% confidence interval; 


Modified Welch method 


It is evident from Figure 3.8.2 that, if the lengths of the sections are not long 
enough, frequencies close together cannot be differentiated. Therefore, we 
propose a procedure, defined as symmetric method, and its implementation is 
shown in Figure 3.8.3. Windowing of the segments can also be incorporated. 
This approach and the rest of the proposed schemes have the advantage of 
progressively incorporating longer and longer segments of the data and thus 
introducing better and better resolution. In addition, due to the averaging 
process, the variance decreases and smoother periodograms are obtained. 
Figure 3.8.4 shows another proposed method, which is defined as the asymmet- 
ric method. Figure 3.8.5 shows another suggested approach for better resolution 
and reduced variance. The procedure is based on the method of prediction and 
averaging. This proposed method is defined as the symmetric prediction method. 
This procedure can be used in all the other forms, e.g., non-symmetric. The 
above methods can also be used for spectral estimation if we substitute the 
word periodogram with the word correlogram. 
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Figure 3.8.3 Modified symmetric Welch method. 


Figure 3.8.6a shows data given by the equation 


47 


x(n) = sin(0.37n) + sin(0.3247n) + 2(rand(1, 128) — 0.5) (3.8.11) 


and 128 time units. Figure 3.8.6b shows the Welch method using the MATLAB 
function (P=spectrum(x,64)) with the maximum length of 64 units and 


0 N-1 


ry 


regent 
Periodogram 1 


+ 


Periodogram 2 


+ 


Periodogram K 


Total/K Averaging 


PSD estimate 


Figure 3.8.4 Modified asymmetric Welch method. 
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Figure 3.8.5 Modified with prediction Welch method. 


windowed by a hanning window. Figure 3.8.6c shows the proposed asym- 
metric PSD method. It is apparent that the proposed method was successful 
to differentiate the two sinusoids with small frequency difference. However 
the variance is somewhat larger. 


The Blackman—Tukey periodogram with the Bartlett window 
The PSD based on the Blackman—Tukey method is given by 


S_(e”) = 5 w(m) fê (me 


(3.8.12) 


0 otherwise 


Book MATLAB function for the Blackman-Tukey periodogram with 
triangle window 

function [s]=aablackmantukeypsd(x,lg,L) 
%function[s]=aablackmantukeypsd(x,lg,L); 

the window used is the triangle (Bartlett) window; 
$x=data;lg=lag number about 20-40% of length(x)=N; 
%L=desired number of spectral points (bins); 
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Figure 3.8.6 Comparison between Welch method (b) and modified Welch method (c). 


[r]=aasamplediasedautoc(x,1lg); 
n=- (lg-1):1:(1g-1)}; 
w=1-(abs(n)/lg); 
rc=[fliplr(r(1,2:1g)} r]; 
rew=rc.*w; 

s=fft (rew,L); 


3.9 Parametric methods of power spectral estimations 


The PSD of the output of a system is given by the relation (see also Section 3.7) 


BB (V2) « (z) (3.9.1) 


S (2)= H(2)H (1/z )S,(2)= ADA OZ)” 


If {A(n)} is a real sequence, then H(z)= H(z) and (3.9.1) becomes 


_ B(z)B(1/z) 
aren S (z) (3.9.2) 
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Since S (e) = o`, then (3.8.1) takes the form 


Be > BEBET) 











ene Ae®y © Ale) Ae) 
(3.9.3) 
7 ,&(e )bb”e (e°) 
TR es (e° )aa”e (ce) 
1 1 
joo oie 
e (e°)= ih e (e°)= l (3.9.4) 
eiPe elf 
a =[1 a(1) a(2)--- a(p)}' , b=[b(0) b01) --- b (3.9.5) 
p 7 
A(e)=1+ X alkje™™, Bee”) = X blikje” (3.9.6) 


Moving average (MA) process 
Setting a(k) = 0 for k = 1, 2,..., p, then 


S ale) = 02e" (eP )bb"e (e), b=[b(0) bO) = bF (89.7) 


Autoregressive (AR) process 
Setting b(k) = 0 for k = 1, 2, 3,..., q, we obtain 


| 1 
(eP) = 0? 


S - 3.9.8 
yaks : e7 (eP jaa” e (e) l ) 





where b(0) was set equal to one without loss of generality. q 
From the above development we observe that we need to find the 4 
unknown filter coefficients to be able to find the PSD of the output of the 4 
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system. For the AR case the coefficients are found’ using the equations. 


a=-r y a=-R r 
Yp Yp Yp yp 
rO ED rp) a1) r (1) 
00 aO en Cte) a(2) r (2)| (3.9.9) 
yp ~ , a= l F P = 
(pal): 7 (p=2)e- 7 (0) a(p) r (p) 


Then, the variance is found from the relation 
p 
Pay 
r (0) + 2- r(k)a(k) = o? (3.9.10) 


Problems 


Q Z222 

3.1.1 If the cdf is F(x(n))=40.6 -—2<x(n)<1, find its corresponding pdf. 
1 1< x(n) 

3.1.2 Show that the sample mean is equal to the population mean. 


3.2.1 Find the autocorrelation of the rv x(n) = acos(n@+ 0), where a and @ 
are constants and @ is uniformly distributed over the interval —7 to x. 


3.3.1 Prove the following properties of a wide-sense stationary process: 


a) u(n)= u, = constant 
b)  1k)= 1k) 

c) r (m,n) =r (m-n) 
d) r(0)> r (k) 


3.3.2 Using Problem 3.2.1, find a 2 x 2 autocorrelation matrix. 


3.4.1 Find the mean and variance of an rv having the following pdf: 


Fan) = 0N 4) exp(-(x(n) -2)}/4). 
3.4.2 Let the rv x(n) has the pdf 


f(x(n) = L (x(n) +1 -2< x(n)<2 


0 otherwise 


Find the mean and variance of x(n). 
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3.4.3 If the rv x(n) is N(u o°), o? > 0, then show that the rv w, =(x(n)- u0, 
is a N(O, 1). 


3.4.4 The Rayleigh pdfis given by f(x(n))= 


E e” 2 (n)/20? 





"u(x(n)), where u(x(n)) 


is the unit step function. 


a) Plot f(x(n)) by changing the parameter o. 
b) Determine the mean and variance. 


Hints-solutions-suggestions 


3.1.1: 


3.1.2: 


3.2.1: 


r (m,n) = E{a* sin(m@ + @)sin(n@ + 8)} = (1/2)a* E{cos[(m — n)@]— (1/2) 


because the ensemble of the cosine with theta is zero and the other cosine 
is a constant independent of the rv theta. 


3.3.1: 


a) 


c) 


3.3.2: 


E EE T | aor 0i 
20 


f(x) = 0.66(x + 2)+ 0.40(x — 1) 


= naise Sa ES even Sut 


T 


-r 


x cos[(m + n)æ + 26]} = (1/2)a* cos[(m - njo] 





E{x(n+q)} = E{x(m+q)} implies that that the mean must be a constant. 














r (k) = E{x(n + k)x(n)} = Efx(n)x(n + k)} =r (n-n- k) =r (-k) 





r (m,n) = f | x(m)x(n)f (x(m + k)x(n + k))dx(m)dx(n) = 1 (m+k,n+ k) 


-00 —OO 


=r (m-n,0)=r (m-n) 


E{[x(n+ k)— x(n} }=r (0)- 2r (k)+r (0)20 or POr: 


1 cos @ 
R = al | 


CcOS@ 1 
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3.4.1: 


oo 


1 2a 1 Í -y°/4 
= | x——e dx, set x-2 =y >U, =-= | (y+ 27d 
Bn j vár Vár”. i ý 


1 
20 TE E, 
Va 


since y is an odd function and the first integral vanishes. The second integral 
is found using tables. 


= | (x- 27 zn en 2y/ idx 
NÅT 





20-7 /4g JAn = 2 (using tables) 


-r “75 


3.4.2: 


= 





H,= | x(n) f (x(n))dx(n) = EEO +1)dx(n) = 1 2 + sa] 


oO = | ZORGON Ddx(n)=1 


Co 


3.4.3: 
x(n) H 
W(w_)= cdf = Pri <w }=Pr{x(n)Sw o +u} > 
n oO n n H n 


w O +H 








"e (x(n — u) 
W = aneen dx 
(w) o Von exp(— )dx(n). 
Change variables y, EE = 
a: oi me 
W = | —=—e "dy .But = 
(w ) Iam Y DU fw )= 





fæ = pepa) —e0<W <% => 


w is N(0,1) 
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3.4.4: 
b)u, = f x(n) f (x(n)dx(n) = f c ) exp(—x*()/20*)dx(n) 
oo 0 ral 


co 


= -Í x(n)d(e* A ) 
0 


oO 


o t fexpx on) / 20, )dx(n) 


0 





= —x(n)exp(-x° (n) / 207) 


s 
21/207 "N2 


gS feon -0 Z XW exp(-x?(n)/20?)dx(n) 
n n 2 Oo n 


Ht 


= -f (x) -0, {E V d(exp(-x* (n) / 207) 


= ~(x(n)— 0 {7/2} exp(-x(n) / 202) 





+f 2(x(n)- o° Jn/2 ) exp(—x* (1)/2.07 )dx(n) 


a =O. (7/2) = On = o? | dlexp(-x°(n)/20?) = o? at o? 
0 
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Wiener filters 


4,1 The mean-square error 


In this chapter we develop a class of linear optimum discrete-time filters known 
as the Wiener filters. These filters are optimum in the sense of minimizing an 
appropriate function of the error, known as the cost function. The cost function 
that is commonly used in filter design optimization is the mean-square error 
(MSE). Minimizing MSE involves only second-order statistics (correlations) and 
leads to a theory of linear filtering that is useful in many practical applications. 
This approach is common to all optimum filter designs. Figure 4.1.1 shows the 
block diagram presentation of the optimum filter problem. 

The basic idea is to recover a desired signal d(n) given a noisy observation 
x(n) =d(n) + v(n), where both d(n) and v(n) are assumed to be WSS processes. 
Therefore, the problem can be stated as follows: 


Design a filter that produces an estimate d(n) of the desired signal d(n) 
using a linear combination of the data x(n) such that the MSE function 
(cost function) 


J =El(d(n) —d(n))?) = Ele?(n)] (4.1.1) 
is minimized. 


Depending on how the data x(n) and the desired signal d(n) are related, 
there are four basic problems that need solutions. These are: Filtering, 
Smoothing, Prediction, and Deconvolution. 


4.2 The FIR Wiener filter 


Let the sample response (filter coefficients) of the desired filter be denoted 
by w. This filter will process the real-valued stationary process {x(n)} to 
produce an estimate d(n) of the desired real-valued signal d(n). Without loss 
of generality, we will assume, unless otherwise stated, that the processes 
ix(n)}, {d(n)}, etc., have zero mean values. Furthermore, assuming that the 


D2: 
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Figure 4.1.1 Block diagram of the optimum filtering problem. 


filter coefficients do not change with time, the output of the filter is equal to 
the convolution of the input and the filter coefficients. Hence, we obtain 


M-1 
d(n) = Sw, x(n —~m)=w'x(n) 


where M is the number of filter coefficients, and 


w=[w, w, = Wyl x(n)=[x(n) x(n-1) + x(n-M 41) 


The MSE is given by (see (4.1.1)) 


J(w) = Efe*(n)} = Ef[d(n) - w'x(n)}°} 
= E{[d(n) — w'x(n)|[d(n)— w` x(n)" 
= E{d?(n)— w'x(n)d(n) - d(n)x"(n)w + w'x(1)x"(n)w} 
= E{d*(n)}—2w' E{d(n)x(n)} + w'E{x(n)x" (n)}w 


a oad T T 
=0,—2w Ppa tw Rw 
where 


w' x(n) =x (n)}w = scalar 
g% = variance of the desired signal, d(n) 


Pa. = [Pa (0) Px) + PaM- DI 


= cross<orrelation vector between d(n) and x(n) 


Pay(O) = 1,,(0), py) = 7,0), + Py, (M- 1) = 7,,(M - 1) 


(4.2.1) 


(4.2.2) 


(4.2.3) 


(4.2.4) 
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x(n) 
x(n —1) 
R, =E3| . [x(n) x(n-1) =- x(n-M+1)] 
x(n- M +1) 
E{x(n)x(n)} E{x(n)x(n —1)} --- Efx(n)x(n-M+1)} 
E{x(n—1)x(n)} E{x(n-1)x(n-1)} -= EHx(n-1)x(n-M +1) 


1l 


E{x(n— M +1)x(n)} E{x(n — M +1)x(n -1)} --E(x(n- M +1)x(n- M +1)} 
r (0) r (1) = r (M-1) 
FAT) r (0) e PM=2) 


r(-M+1) r(-M+2) --- r (0) 
(4.2.5) 


The above matrix is the correlation matrix of the input data, and it is 
symmetric because the random process is assumed to be stationary, and 
hence, we have the equality, r (k)= r, (—k). Since in practical cases we have 
only one realization, we will assume that the signal is ergodic. Therefore, 
we will use the sample autocorrelation coefficients given in Section 3.2. 


Example 4.2.1: Let us assume that we have found the sample autocorre- 
lation coefficients (7,(0) = 1.0, r,(1) = 0) from given data x(n), which, in addition 
to noise, contain the desired signal. Furthermore, let the variance of the desired 
signal oj = 24.40 and the cross-correlation vector be py,= [2 4.5]'. It is desired 
to find the surface defined by the mean-square function J(w). 


Solution: Substituting the values given above in (4.2.3), we obtain 


2 1 Oj] w, 
J(w) = 24.40-2[w, al iste al | | 
4.2 0 1] w, (4.2.6) 


= 24.40-4w, -9w, + wi +w? 


Note that the equation is quadratic with respect to filter coefficients, and 
it is true for any number of filter coefficients. This is because we used the 
mean-square error approach for the minimization of the error. Figure 4.2.1 
shows the schematic representation of the Wiener filter. The data are the sum 
of the desired signal and noise. From the data we find the correlation matrix 
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Figure 4.1.1 Block diagram of the optimum filtering problem. 


filter coefficients do not change with time, the output of the filter is equal to 
the convolution of the input and the filter coefficients. Hence, we obtain 


d(n) = Y wan —m) = w'x(n) 


m=0 


where M is the number of filter coefficients, and 


w=[W, w, Wy)’, x(n)=[x(n) x(n-1) = x(n- M+)" 


The MSE is given by (see (4.1.1)) 


J(w) = Ele*(n)} = Elld(n)- w'x(n)P} 
= E{[d(n)— w'x(n)][d(n)— w*x(n)]" 
= E{d?(n) — w'x(n)d(n)—d(n)x"(n)w + w'x(n)x"(n)w} 
= E{d*(n)}— 2w' E{d(n)x(n)} + w' E{x(n)x' (n)}w 


= 07 —2w'p,, +w'R, w 
where 


w' x(n) = x'(n)w = scalar 
g% = variance of the desired signal, d(n) 


Pay = [Pa (0) Pa (1) ae PaM- DI 


= cross<orrelation vector between d(n) and x(n) 


Pax (0) a Vax (0), Pa (1) = Vax (1), Hi 1 Pax (M - 1) = Vay (M E 1) 


(4.2.1) 


(4.2.2) 


(4.2.3) 


(4.2.4) 
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x(n) 
x(n- 1) 
R, =E l [x(n) x(n-1) -- x(n- M +1)] 
x(n- M +1) 
E{x(n)x(n)} E{x(n)x(n — 1)} e Efx(njx(n—-M-+1)} 
E{x(n—1)x(n)} E{x(n—1)x(n-D} -= Efx(n-1)x(n-M +1) 


Elx(n- M +1)x(n)}}Efx(n- M +1)x(n-1)} --E(x(n- M + D)x(n- M +1) 
r, (0) r (1) se SCE) 
r1) r, (0) w n (M-2) 


r(-M+1) r(-M+2) -e r10) 
(4.2.5) 


The above matrix is the correlation matrix of the input data, and it is 
symmetric because the random process is assumed to be stationary, and 
hence, we have the equality, r (k) =1r,(-k). Since in practical cases we have 
only one realization, we will assume that the signal is ergodic. Therefore, 
we will use the sample autocorrelation coefficients given in Section 3.2. 


Example 4.2.1: Let us assume that we have found the sample autocorre- 
lation coefficients (r,(0) = 1.0, r,(1) = 0) from given data x(n), which, in addition 
to noise, contain the desired signal. Furthermore, let the variance of the desired 
signal oj = 24.40 and the cross-correlation vector be paą= [2 4.5]! It is desired 
to find the surface defined by the mean-square function J(w). 


Solution: Substituting the values given above in (4.2.3), we obtain 


2 1 0| w, 
Jew) = 24.40- 2w, al fom al | | 
4.2 0 1| w (4.2.6) 


= 24.40- 4w, — 9w, +w +w? 


Note that the equation is quadratic with respect to filter coefficients, and 
it is true for any number of filter coefficients. This is because we used the 
mean-square error approach for the minimization of the error. Figure 4.2.1 
shows the schematic representation of the Wiener filter. The data are the sum 
of the desired signal and noise. From the data we find the correlation matrix 
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d(n) 


v(n) 





d(n) e(n) 


Filter, w 





Figure 4.2.1 A schematic presentation of Wiener filtering. 


and the cross-correlation between the desired signal and the data. Note that to 
find the optimum Wiener filter coefficients, the desired signal is needed. 
Figure 4.2.2 shows the MSE surface. This surface is found by inserting different 
values of w and w, in the function J(w). The values of the coefficients that 
correspond to the bottom of the surface are the optimum Wiener coefficients. 
The vertical distance from the w,-w, plane to the bottom of the surface is known 
as the minimum error, Jin, and corresponds to the optimum Wiener coefficients. 
We observe that the minimum height of the surface corresponds to about w= 2 
and w, = 4.5, which are the optimum coefficients, as we will learn how to find 
them in the next section. Figure 4.2.3 shows an adaptive FIR filter. 
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Figure 4.2.2 The mean-square error surface. 
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x(n) x(n~ 1) x(n- M + 2) x(n—-M+#1) 





Adjusting 
weights 
algorithm 


Figure 4.2.3 An adaptive FIR filter. 


4.3. The Wiener solution 


From Figure 4.2.2, we observe that there exists a plane touching the parabolic 
surface at its minimum point, and it is parallel to the w-plane. Furthermore, 
we observe that the surface is concave upward, and therefore, the first 
derivative of the MSE with respect to w and w, must be zero at the minimum 
point, and the second derivative must be positive. Hence, we write 


Of (ww) _ J (WoW) _ 





OW, i Ow, Pe a 
(4.3.1) 
d (Wy, w) 0°] (wy, w) 
W, d d (w) glee AD) 


For a two-coefficient filter, (4.2.3) becomes 
](W,,W,) = wér (0)+ 2ww r, (1) + wir (0)- 2W tj,(0)— 2w,r,, F o$ 


(4.3.2) 


Introducing (4.3.2) in part (a) of (4.3.1) produces the following set of 
equations 


2wr,(0)+2wir (1)- 2r (0) = 0 (a) 
(4.3.3) 
2w r (0)+2wr,(1)-2r,(1)=0 (b) 
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The above system can be written in the following matrix form, called the 
Wiener—Hopf equation: 


R w°=p (4.3.4 
x dx 


where the superscript “o” indicates the optimum Wiener solution for the 
filter. Note that to find the correlation matrix we must know the second-order 
statistics. If, in addition, the matrix is invertible, which is the case in most 
practical signal processing applications, the optimum filter is given by 


w° =R"p,, (4.3.5) 


For an M-order filter, R, is an M x M matrix, w° is an M x 1 vector, and p 
is an M x 1 vector. 

If we differentiate (4.3.3) once more with respect to wg and w] (i.e., differ- 
entiating /(w) twice), we find that the result is equal to 2r,(0). Since r,(0) = 
E{x(m)x(m)} = o? >0, the surface is concave upward. Therefore, the extreme is 
the minimum point of the surface. Furthermore, if we substitute (4.3.5) in (4.2.3), 
we obtain the minimum error in the mean-square sense (see Problem 4.3.1) 


Toi = o4 = Pa W? F o — PR, Pax (4.3.6) 


which indicates that the minimum point of the error surface is at a distance 
Jmin above the w-plane. The above equation shows that if no correlation exists 
between the desired signal and the data, the error is equal to the variance 
of the desired signal. 

The problem we are facing is how to choose the length of the filter M. 
In the absence of a priori information, we compute the optimum coefficients, 
starting from a small reasonable number. As we increase the number, we 
check the MMSE, and if its value is small enough, e.g., MMSE < 0.01, we 
accept the corresponding number of the coefficients. 


Example 4.3.1: We would like to find the optimum filter coefficients w 
and w, of the Wiener filter, which approximates (models) the unknown 
system with coefficients b)= 1 and b, = 0.38 (see Figure 4.3.1). 


Solution: The following Book MATLAB program was used: 
% example4 3 1 m-file 


v =0.5*(rand(1,20)-0.5);%v=noise vector(20 uniformly 
distributed rv’swith mean zero); 

x=randn(1,20);% x=data vector entering the system and 
Sthe Wiener filter (20 normal 

distributed rv’s with mean zero); 


Dt etic ds died 
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Figure 4.3.1 Illustration of Example 4.3.1. 


sysout=filter([1 0.38],1,x);% sysout=system output 
$with x as input; filter(b,a,x) is a 

MATLAB function, where 

%b is the vector of the coefficients of the ARMA 
gnumerator, a is the vector of 

tthe coefficients of the ARMA denominator (see (2.4.7); 
dn=sysout +v; 

rx=aasamplebiasedautoc(x,2);%book MATLAB function with 
slag=2; 

Rx=toeplitz(rx);%toeplitz() is a MATLAB function that 
gives the symmetric autocorrelation matrix; 
pdx=xcorr(x,dn, ‘biased’);%xcorr() a MATLAB function 
that gives a symmetric biased crosscorrelation; 
p=pax(1,19:20); 

w=inv (Rx) *p'; 

dnc=aasamplebiasedautoc(dn,1);% s2dn=variance of the 
desired signal; 

jmin=dnc-p*w; 


Some representative values found in this example are: R,= [0.9778 
0.0229 0.0229 0.9675], p = [0.3916 1.0050], w = [1.0190 0.3767]; J min = 0.0114. 
We observe that the Wiener filter coefficients are close to those of the 
unknown system. 
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Orthogonality condition 


In order for the set of filter coefficients to minimize the cost function J(w), 
it is necessary and sufficient that the derivatives of J(w) with respect to w, 
be equal to zero for k =0, 1, 2,..., M-1, 








s = ie Efe(n)e(n)} = 2E{e(n) = = 0 (4.3.7) 
But 
M-1 
e(n) = d(n)— È w,x(n-m) (4.3.8) 
m=0 
and, hence, it follows that 
od) = —x(n—-k) (4.3.9) 
dw, 
Therefore, (4.3.7) becomes 
Efe (n)x(n — k)} = 0 k=0,1,2,.., M-1 (4.3.10) 


where the superscript ”o” denotes that the corresponding w;,’s used to find the 
estimation error (n) are the optimal ones. Figure 4.3.2 illustrates the orthogo- 
nality principle, where the error e(n) is orthogonal (perpendicular) to the data 
set {x(n)} when the estimator employs the optimum set of filter coefficients. 






e°(n) = d(n) — dèn) 


wox (7) 








x(n- 1) 


Figure 4.3.2 Pictorial illustration of the orthogonality principle. 
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4.4 Wiener filtering examples 


The examples in this section will illustrate the use and utility of the Wiener 
filters. 


Example 4.4.1 (Filtering): Filtering of noisy signals (noise reduction) is 
extremely important, and the method has been used in many applications, 
such as speech in noisy environment, reception of data across a noisy chan- 
nel, enhancement of images, etc. 

Let the received signal be x(n) =d(n) + v(n), where v(n) is a noise with 
zero mean, variance o-, and it is uncorrelated with the desired signal, 
d(n). Hence, 


pa (m) = E{d(n)x(n — m)} = E{d(n)d(n — m)} + E{d(n)}E{o(n - m)} | 
4.4.1 
= E{d*(m)} = r,(m) 


Similarly, we obtain 
r(m)= E{x(n)x(n—m)} = nn) +r, (m) (4.4.2) 


where we used the assumption that d(n) and v(n) are uncorrelated, and v(n) 
has zero mean value. Therefore, the Wiener-Hopf equation (Equation 4.3.4) 
becomes 


(Ri +R,)W° =p,, (4.4.3) 


The following Book MATLAB program was used to produce the results 
shown in Figure 4.4.1. 


sexample4 4 1 m-file 

n=0:511; 

d=sin(.1*pi*n) ;%desired signal 
v=0.5*randn(1,512);%white Gaussian noise; 
x=d+v;%input signal to Wiener filter; 
rd=aasamplebiasedautoc (d,20);%rdx=rd=biased autoc. 
S6function of the desired signal(see 4.4.1); 
rv=aasamplebiasedautoc(v,20);%rv=biased autoc. func- 
$tion of the noise; 
R=toeplitz(rd(1,1:12))+toeplitz(rv(1,1:12));%see(4.4. 
53); 

pdx=rd(1,1:12); 

w=inv(R)*pdx'; 

y=filter(w',1,x);%ďoutput of the filter; 
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x(n), y(n) 





Figure 4.4.1 Illustration of Example 4.4.1. 


But 

o? =0} +0}; (0. =7,(0), o} =7,(0), oF =r (0) (4.4.4) 
and, hence, from the MATLAB function var we obtained var(d) = var(x) — 
var(v) = 0.4968 and Jmin = 0.4968 — pg,w° = 0.0320. 

We can also use the Book MATLAB function [w,jmin] = aawienerfirfil- 
ter(x,d,M), to obtain the filter coefficients and the minimum MSE. 

Example 4.4.2 (Filtering): It is desired to find a two-coefficient Wiener 
filter for the communication channel shown in Figure 4.4.2. Let v,(”) and 
v,(n) are white noises with zero mean, uncorrelated with each other and with 
d(n), and have the following variances: o? = 0.31, ø% = 0.12. The desired signal 
produced by the first filter shown in Figure 4.4.2 is 

d(n) = —0.796d(n— 1) +v, (n) (4.4.5) 


Therefore, the autocorrelation function of the desired signal becomes 
E{d(n)d(n)} = 0.7967 E{d*(n - 1)} — 2 x 0.796E{d(n — 1)}E{v, (n)} + Elo? (n)} 


OT 


o? =0,796o2 +0? or o7=0.31/(1-0.7967)=0.8461 (4.4.6) ] 
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Figure 4.4.2 Illustration of Example 4.4.2. 


From the second filter we obtain 
d(n) = u(n)— 0.931u(n—- 1) (4.4.7) 
Introducing (4.4.7) in (4.4.5) we obtain 
u(n)— 0.135u(n—1)—-0.7411u(n — 2) =v (n) (4.4.8) 


But x(n) = u(n) + v,(n) and, hence, the vector form of the set becomes x(n) = 
u(n) + v,(n). Therefore, the autocorrelation matrix R, becomes 





Elx(n)x"(n)}=R, = Effu(n)+ v, (nut n+ vi (n =R, +R, (4.4.9) 





where we used the assumption that u(n) and v,(7) are uncorrelated zero-mean 
random sequences, which implies that E{v,(m)u'(n)} = Ef{v.(m)}E{u(n)} = 0. 

Next, we multiply (4.4.8) by u(m—m) and take the ensemble average of 
both sides, which results in the following expression 


r (m)—0.135r,(m~1)—0.7411r (m—2)=r,,(m) = E{ 


v,(n)u(n—m)} (4.4.10) 
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Setting m = 1 and m = 2, the above equation produces the following 
system: 


et om bevel ines 
= = Yule~Walker equation (4.4.11) 
r) 7,(O) || 0.7411 =y (2) 


since y(n) and u(n—m) are uncorrelated. If we set m = 0 in (4.4.10), it becomes 
r (0)-0.135r (-1)- 0.7411r (2) = Ef{v, (n)u(n)} (4.4.12) 


If we, next, substitute the value of u(n) from (4.4.8) in (4.4.12), taking 
into consideration that v and u are independent rv’s, we obtain 


o? =r, (0)—0.135r, (1) - 0.74111, (2) (4.4.13) 


where we used the symmetry property of the correlation function. From the 
first equation of (4.4.11) we obtain the relation 


0.139 0.135 
oe „O= 2589% 


7 (4.4.14) 
(=O7411 





Substituting the above equation in the second equation of the set of (4.4.11), 
we obtain 


ry Ox Pa Ge cae 
i 0.2589 





o} + 0.74110? (4.4.15) 


Hence, the last three equations give the variance of u 


ae °F 0.31 


= = = 0.9445 4.4.16 
Fu 0.3282 0.3282 ( ) 





Using (4.4.16), (4.4.14), and the value o; = 0.12, we obtain the correlation 
matrix 


0.9445 0.4925 0.12 0 1.0645 0.4925 
R =R, v2 + = 


(4.4.17) 4 
O 0.12) | 0.4925 1.0645 4 


0.4925 0.9445 


From Figure 4.4.2 we find the relation 


u(n) —0.931u(n—1) = d(n) (4.4.18) 
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Multiplying (4.4.18) by u(n) and then by u(n-1), and taking the ensemble 
averages of the results, we obtain the vector py, equals to 


Pa = [0.4860 - 0.3868]’ (4.4.19) 


Minimum mean-square error (MSE) 


Introducing the above results in (4.2.3), we obtain the MSE-surface (cost 
function) as a function of the filter coefficients. Hence, 


0.4860 1.0645 0.4925 


](w) = 0.8461- 2[w, w] +[w, w] 
-0.3868 0.4925 1.0645 


= 0.8461 + 0.972w, —0.7736w, +1.0645w3 +1.0645w; + 0.985w,w, 
(4.4.20) 


The MSE surface and its contour plots are shown in Figure 4.4.3. 


Optimum filter (w°) 
The optimum filter is defined by (4.3.5), and in this case takes the following form: 


1.1953 -0.5531 || 0.4860] | 0.7948] |w, 
w° =Rp,, = = (4.4.21) 
-0.5531 1.1953 || -0.3868 | |-0.7311] |w, 


J(w) 





Figure 4.4.3 The MSE surface and its corresponding contour plots of Example 4.4.2. 
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Figure 4.4.4 System identification set up. 

The minimum MSE is found using (4.3.6) that, in this case, gives the value 
Jags = o; -Pan R, Par 

1.1953 -— veal Hie 


= 0.8461 - [0.4860 — 0.3868] 
—0.5531 1.1953 


0.3868 


= 0.1770 
(4.4.22) 


Example 4.4.3 (System identification): It is desired, using a Wiener filter, 
to estimate the unknown impulse response coefficients h,’s of a FIR system 
(see Figure 4.4.4). The input {x(n)} is a zero mean iid rv’s with variance oe 
Let the impulse response h of the filter be: h=[0.9 0.6 0.2]". Since the input 
{x(n)} is zero mean and iid rv’s, oe correlation matrix R, is a diagonal matrix 
with elements having values o*. The desired signal d(n) is the output of the 
unknown filter, and it is given by (see Section 2.4): d(n) = 0.9x(n) + 0.6x(n — 1) 
+ 0.2x(n — 2). Therefore, the cross-correlation output is given by: 


p(t) = E{d(n)x(n—i)} = E{{0.9x(n) + 0.6x(1 —1)4+ 0.2x(n — 2)|x(n —1)} 
= 0.9E{x(n)x(n—i)} + 0.6E{x(n —1)x(n—1)} + 0.2E{x(n— 2)x(n—1)} 
= 0.97, (i) + 0.6r,(i-1) + 0.27, (x — 2) 
(4.4.23) 


Hence, we obtain (r (m)=0 for m#0): p,,(0)= 0.902, p,,(1) = 0.60%. The opti- 
mum filter is 


Tı olf 09 0.9 
w° = R pa = (02) ao) , and the MMSE is 
0 110.6 0.6 


(assuming 0% =1 ) 
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1 0} 0.9 
Tin =O, 10.9 0.6] . 
1 || 0.6 


But, 


o, = E{d(n)d(n)} = E{[0.9x(n) + 0.6x(n — 1) + 0.2x(n — 2} 


= 0.81 + 0.36 + 0.04 = 1.21 and, hence, J_,, = 1.21 — (0.9% + 0.67) = 0.04. 


Book MATLAB function for system identification (Wiener filter) 
function[w, jm] =aawienerfirfilter(x,d,M) 

sfunction[w, jm]=aawienerfirfilter(x,d,M) ; 

sx=data entering both the unknown filter(system) and 
the Wiener filter; 

¢d=the desired signal=output of the unknown system; 
length(d)=length(x); 

$M=number of coefficients of the Wiener filter; 
$w=Wiener filter coefficients; jm=minimum mean-square 
Serror; 

pdx=xcorr(d,x, 'biased'); 

p=pdx (1, (length (pdx) + 1)/2: ( (length(pdx)+1)/2)+M-1); 
rx=aasamplebiasedautoc(x,M); 

R=toeplitz (rx); 

w=inv(R)*p'; 

jm=var(d)-p*w;% var() is a MATLAB function; 


By setting, for example, the following MATLAB procedure: x = 
randn(1,256); d = filter([0.9 0.2 - 0.4],1,x); [w,jm] = aawienerfirfilter(x,d,4); we 
obtain: w = [0.9000 0.2000 —0.3999 —0.0004], J ni, = 0.0110. We note that, if we 
assume a larger number of filter coefficients than those belonging to the 
unknown system, the Wiener filter produces close approximate values to 
those in the unknown system and produces values close to zero for the 
remaining coefficients. 


Example 4.4.4 (Noise canceling): In many practical applications there 
exists a need to cancel the noise added to a signal. For example, we are using 
the cell phone inside the car and the noise of the car or radio is added to 
the message we are trying to transmit. A similar circumstance appears when 
pilots in planes and helicopters try to communicate, or tank drivers try to 
do the same. Figure 4.4.5 shows pictorially the noise contamination situa- 
tions. Observe that the noise added to the signal and the other component 
entering the Wiener filter emanate from the same source but follow different 
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e(n) =x(n)— v (n) 
Signal source x(n) = d(n) + v(n) = d(n)+ v,(n) ~- v, (n) 
(pilot) 





Noise source 
(cockpit noise) 


Figure 4.4.5 Illustration of noise canceling scheme. 


paths in the same environment. This indicates that there is some degree of 
correlation between these two noises. It is assumed that the noises have zero 
mean values. The output of the Wiener filter will approximate the noise 
added to the desired signal and, thus, the error will be close to the desired 
signal. The Wiener filter in this case is 


R w°=p (4.4.24) 


because the desired signal in this case is v}. 
The individual components of the vector P, ,, are 


Pa», (m) = E{v,(n)o,(n — m)} = E{(x(n) ~ d(n))v,(n— m)} 


= E{x(n)v,(n~ m)}— Eld(n)v,(n- m)} = p, (m) ree 
Because d(n) and v,(1) are uncorrelated, 
E{d(n)v,(n—m)) = E{d(n))E{v,(n—m)} = 0. 
Therefore, (4.4.24) becomes 
R, w’=p,,, (4.4.26) 


To demonstrate the effect of the Wiener filter, let d(n) = 0.99" sin(0.1nz + 
0.27), v,(n)=0.80,(n—1)+ (nm) and v,(n) =—0.95v,(n—-1)+ v(n), where v(n) is 
white noise with zero mean value and unit variance. The correlation 
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Figure 4.4.6 Illustration of Example 4.4.4. 


matrix R, and cross-correlation vector p,,, are found using the sample biased 
correlation equations 


N-1 


reall = Deamon k=0,1,::-,K-1, K&N 

vi (4.4.27) 
Pe =E X xv, (0-b k=0,1,---,K-1l, K&N 

n=0 


Figure 4.4.6 shows simulation results for three different-order filters using 
the Book MATLAB given below. 


Book MATLAB function for noise canceling 


function[d,w, xn] =aawienernoisecancelor(dn,al,a2,v,M,N) 
&[d,w, xn] =aawlenernoisecance= 
S6lor(dn,al,a2,v,M,N);dn=desired signal; 

al=first order IIR coefficient,a2=first order IIR 
Scoefficient; 

$v=nolise;M=number of Wiener filter coefficients ;N=num- 
ber of sequence 
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$elemets of dn(desired signal) and v(noise) ;d=output 
desired signal; 
$w=Wiener filter coefficients;xn=corrupted sig- 
$nal;en=xn-vl=d; 
v1(1)=0;v2(1)=0; 
for n=2:N 
vl (n)=al*vl(n-1)+vi(n-1); 
v2 (n)=a2*v2(n-1)+vi(n-1); 
end; 
v2autoc=aasamplebiasedautoc(v2,M) ; 
xn=oner vl 
Rv2=toeplitz(v2autoc) ; 
pl=xcorr(xn,v2,'biased') ; 
if M>N 
disp(['error:M must be less than N']); 
end; 
R=Rv2(1:M,1:M); 
p=p1 (1, (length(p1)+4+1)/2: (length(pl1)+1)/2+M-1) ; 
w=inv(R)*p'; 
yw=filter(w,1,v2); 
d=xn-yw(:,1:N); 


Example 4.4.5 (Self-correcting Wiener filter (SCWF)): We can also 
arrange the standard single Wiener filter in a series form as shown in 
Figure 4.4.7. This configuration permits us to process the signal using filters 4 
with fewer coefficients, thus saving in computation. Figure 4.4.8a shows | 
the input to the filter, which is a sine wave with added Gaussian white | 
noise. Figure 4.4.8b shows the output of the first stage of a self-correcting 
Wiener filter and Figure 4.4.8c shows the output of the fourth stage of the 
self-correcting Wiener filter (each stage has ten coefficients). 





Figure 4.4.7 Self-correcting Wiener filter (SCNF). 
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Figure 4.4.8 Illustration of Example 4.4.5. 


Problems 
4.3.1 Verify (4.3.6). 


4.3.2 Find the Wiener coefficients wọ and w; that approximate (models) the 
unknown system coefficients (see Figure 4.3.1), which are by = 0. 9 and 
b, = 0.25. Let the noise {v(1)} be white with zero mean and variance 07 = 0.15, 
Further, we assume that the input data sequence {x(n)} is stationary white 
process with zero mean and variance o? =1. In addition, {v(n) and {x(n)} are 
uncorrelated and {v(7)} is added to the output of the system under study. 


4.3.3 Find Jmin using the orthogonality principle. 


4.4.1 Let the data entering the Wiener filter are given by x(n) = d(n) + v(n). 
The noise v(n) has zero mean value, unit variance, and is uncorrelated with the 
desired signal d(n). Furthermore, assume 7,(m) = 0.9" and r,(m) = 6(m). Find 
the following: Pax, W, Jmin; Signal power, noise power, signal-to-noise power. 
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Figure p4.4.3 Wiener filters of Problem 4.4.3. 


4.4.2 Repeat Example 4.4.4 with the difference that a small amount of the 
signal is leaking and is added to v,(m) noise. Use the following quantities 
and state your observations: 0.05d(n), 0.1d(n), 0.3d(n), and 0.6d(n). 


4.4.3 Find the cost function and the MSE surface for the two systems shown 
in Figure p4.4.3. Given: Ef{s*(n)} = 0.9, Ef{s(n)s(n—1)} = 0.4, E{d*(n)} = 3, 
E{d(n)s(n)} = —0.5, and E{d(n)s(n—1)} = 0.9. 


Hints-solutions-sug gestions 
4.3.1: 


peek, oT Oo 2 TD <0 — e2 TL o\T wo 2 O\T ...0 
Jan = 07 ~2W° R, W =0, —w’ R w =0;-(R w°) w°=0;-(R w°) w 
=04 -p wW =0,-p,,R'p,, (R, is symmetricand wand p, are vectors) 


4.3.2: 


Efx? (m) aasi } 
R = = 


0 
i j (1). Since {x(n)} and {v(n)} 


© [Elx Dreh EP) 
are white processes implies that 


E{x(n-1)x(n)} = E{x(n)x(n - 1)} = Etx(n — 1)x(n)) = E{x(n)}E{x(n - 1)} = 0, 
E{x*(n)} = E{x*(n — 1)} = 1, E{x(n)o(n)} = E{x(n)}E{v(n)} = 0, 


E{x(n — 1)v(n)} = E{x(n — 1)JE{v(n)} = 0, 
we obtain 


Effo(n) + 0.9x(n) + 0.25x(n — 1)]x(n)} | 0.907 H | 
= = (2), 


Pa E{[v(n) + 0.9x(n) + 0.25x(n — 1)]x(n — 1)} j 0.2502 0.25 
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o? = Eld’ (n) = E{[o(n) + 0.9x(n) + 0.25x(n — DJom) + 0.9x(n) + 0.25x(n—1)]} 
= Elv?(n)} + 0.81E{x2(n)} + 0.0625E{x2(n —1)} = 0.15 + 0.81+ 0.0625 = 1.0225 


(3). 
Introducing (1), (2), and (3) in (4.2.3) we obtain 


J =1.0225-2[w, w] +[w, w] 
0.25 O EE 


= 1.0225 - 1.8w, —0.5w, + w + wi 


The optimum Wiener filter is given by (see (4.3.5)) 


1 00.9 0.9 
w=R Pa = = , and the minimum error is given by 
= 10 0.25) 0125 


0.9 
Jain = 02 — pl, w = 1.0225 -[0.9 025) | = 1.0225 - 0.8725 = 0.15. 
0.25 


J = Ele(n)[d(n) - ce m)l} = Efe(n)d(n Pon E{e(n)x(n — m)). If the 
m=0 m=0 
coefficients have their optimum value, the orthogonality principle states that 
E{e(n)x(n — m)} = 0 and, 


M-i 
hence, J... = Ele(nd(n)} = E{d(n)d(n) - S wix(n —m)d(n)} = 


m=0 


M-1 M-i 


-X w Ela(n)x(n— m) = 0} -Y w, pap) = 10- Pg," = 140) -PRR Pav 
m=0 m=0 
4.4.1: 


pa (m) = Efd(n)x(n — m)} = E{d(n)d(n— m)} + Ed (n) Eton- m) =r (m) (1) 


since d(n) and v(n) are independent and WSS. Also, r (m) = E{x(n) 
x(n —m)} =r; (m)+r,(m), where again the independence and the zero mean 
properties of the noise were introduced. Therefore, the Wiener equation 


76 Adaptive filtering primer with MATLAB 


becomes: Ra + R, = pax From the relation pg,(m) = r3(m) (see (1)), and the 4 
given relations in the problem, we find that the Wiener equation and its 3 
inverse are given by 


2 09|[w,] fi w| 1 [119] [0.373 
092 læ l loo! jw] 319109 | }o.282} 
The minimum error is 


Jee g= Pa R W? =7(0)- Pa W° = 0.373. Since 1,(0) = o3 =1 and om =1, 


the power of the desired signal and noise are equal and, hence, 10log(1/1) = 0. 
After filtering x(n) we find that the signal power is 


A 1 0.9 |} w 
Eld (O}= wR w =[w w = 0.408, and the noise power 
0.9 1 wy 
is 
A 1 0 We 
E{v*(0)}= wR, w° =[we wi] = 0.285. 
0 we 


Therefore, SNR = 10log(0.408/0.285) = 1.56, which shows that the Wiener 
filter increased the SNR by 1.56 dB. 


4.4.3: 
a)j = Edn) —[s(n) + w, s(n - DP) 
= E{d*(n) —[s(n) + w s(n- DF — d(n)[s(n) + w s(n- 1)]) 
= E{d*(n)} ~ Els? (n)}) - w Els" (n — 1)} -2w ,E{s(n)s(n — 1)} - 2E{d(n)s(n)} 
-2w Els(n)s(n — 1)) 


=3-0.9— w* 0.9 -2w 0.4+2x0.5-2w 0.4=3.1- 0.9w* -0.6w, > 


2 =~ 0.9x 2w, - 1.6 = 0= w, = 0.889. (b) Similar to (a) 
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Eigenvalues of R, — properties 
of the error surface 


5.1 The eigenvalues of the correlation matrix 


Let R, be an M x M correlation matrix of a WSS discrete-time process 
computed from a data vector x(n). This symmetric and non-negative matrix 
(see Problem 5.1.1) can be found using MATLAB as follows: we find the data 
vector x(n) = [ —0.4326 -1.6656 0.1253 0.2877 —1.1465 1.1909 1.1892 
-0.0376 0.3273 0.1746 ] using the MATLAB command: x = randn(1, 10). 
From. this data vector we find its autocorrelation function r,(m) = [0.7346 
0.0269 —0.1360] using the Book MATLAB function: rx = aasamplebiasedau- 
toc(x, 3). Next, we obtain the correlation matrix 


0.7346 0.0269 — 0.1360 
R, = | 0.0269 0.7346 0.0269 


—0.1360 0.0269 0.7346 


using the MATLAB function: R, = toeplitz(rx). Having found the correlation 
matrix R,, we wish to find an M x 1 vector q such that 


R.q= Aq (5.1.1) 


for some constant A. The above equation indicates that the left-hand side 
transformation does not change the direction of the vector q but only its 
length. The characteristic equation of R, is given by 


det(R,q — AD) = 0 (5.1.2) 
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The roots 4,,A4,,°++,A,, of the above equation are called eigenvalues of R,. 
Therefore, each eigenvalue corresponds to an eigenvector q; such that 


R q, = Aq, i=1,2,::-,M (5.1.3) 
The MATLAB function 


[Q, D] = eig(R); (5.1.4) 


gives the diagonal matrix D (D = A) containing the eigenvalues of R (R = R} 
and gives the matrix Q with its columns equal to the eigenvectors of R. 
Using the 3 x 3 matrix above in (5.1.4), we obtain 


0.6842 0.1783 -0.7071 0.5886 0 0 
Q=|-0.2522 0.9677 -0.0000 |, D= 0 0.7445 0 
0.6842 0.1783 0.7071 0 0 0.8706 


In Table 5.1.1 below, we give the eigenvalue properties of WSS processes. 
Continuing with our correlation matrix using MATLAB, we also find that 


>Q(:,1) * QC,3) = 5.551le — 17 % Q (:, 1)’= row of the first column 


% of Q, Q (1:,3) = third column of Q, 
% the results verify the orthogonality 


% property; 
Table 5.1.1 Eigenvalue Properties 
x(n) =[x(n) x(n-1)--x(n-M+1)] Wide-sense stationary stochastic process 
R, = E{x(n)x"(n)} Correlation matrix 
Ài The eigenvalues of R, are real and positive 
q;4q, =0 Two eigenvectors belonging to two different 


eigenvalues are orthogonal 


Q'Q=Il Q=[q, q qma]; Q is a unitary matrix 
M-1 
R, =QAQ? = 2 Aq; q; A=diag[A, A, À, Aya] 
fr{R,}= 5 À; tr{R} = trace of R = sum of the diagonal 
: i=0 


elements of R 
Ax 25, = max S_(e!”) 


pia -ASOR 
à. >S™" = min S (e”) 
x x 


-AEWA 
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0.7346 0.0269 — 0.1360 
Q*A*Q' =] 0.0269 0.7346 0.0269 |= R, as it should be. 
— 0.1360 0.0269 0.7346 


5.2 Geometrical properties of the error surface 


The cost function J (see Section 4.2) can be written in the form 
w'R w-2p'w-(J-07%) = 0) (5.2.1) 


If we set values of J > Jn the w plane will cut the second order surface, for 
a two-coefficient filter, along a line whose projection on the w-plane are 
ellipses arbitrarily oriented as shown in Figure 5.2.1. To obtain the contours, 
we used the following Book MATLAB program: 


woH—-3 20.0523 

wl=-3:0.05:3; 

[x,y]=meshgrid(w0,wl1); 
jJ=0.8461+0.972*x-0,773*y+1.0647*x.^2+1.064*y.^2+0.985x.*y; 
contour (x,y,j3,30);%30 is the number of desired contours 





Figure 5.2.1 MSE contours in the w- and &-planes. 
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If we introduce the MATLAB function contour(j, [2.3 3.1  5]) in the 
command window, we produce three contours at heights 2.3, 3.1, and 5. 

The first simplification we can do is to shift the origin of the w-axes to 4 
another one whose origin is on the center of the ellipses. This is accomplished 4 
using the transformation: € = w — w°. If we introduce this transformation in 
(5.2.1) and setting J = 2J min we obtain the relationship (see Problem 5.2.1) 


ETR €-2]_. +0, -pw =0 (5.2.2) 
But om —p'w° =] a and thus, the above equation becomes 


E'R é=] n (5.2.3) 


We can further simplify (5.2.3) by a linear transformation such that the major 
axes are aligned with a new set of axes, (&, 6). The transformation is 
accomplished by the linear relationship 





E=QE or G=Q'E (5.2.4) 


where Q is a matrix whose columns are the eigenvectors of the corre- 
lation matrix R,. The matrix Q is unitary having the property: Q" = Q7 
(see Table 5.1.1). Substituting next (5.2.4) in (5.2.3) and setting R, = QAQ" 
or Q'R Q =A, we obtain the equation 


ETAG = Jin (5.2.5) 
The matrix is a diagonal matrix whose elements are the eigenvalues of the 


correlation matrix R,. 


Example 5.2.1: Let A, = 1, A, =0.5, and J min = 0.67. The ellipse in the (&, &) 
plane is found by solving the system 





Ee <4 = = =(0.67 or Si P to i‘. 
e Slo osle | [0.67/1 0.67/0.5 | 
(5.2.6) 


where 0.67/0.5 is the major axis and 0.67/1 is the minor one. Hence, for 


the case J = 2J min the ellipse intersects the & axis at 0.67 and the & at 1.34. 
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If we start with (5.2.1) and apply the shift and rotation transformations, 
we obtain the relationships 


J=] „n +(w- w°) R (w-w°)=] „tE RE 
(5.2.7) 
= J an tE (QAQ E= J on ETAG 


Notes: 


e The contours intersect the €’-axes at values dependent upon the 
eigenvalues of R, and the specific MSE value chosen. The rotation 
and translation do not alter the shape of the MSE surface. 

e If the successive contours for the values 2J nin 3Jmin etc., are close to 
each other, the surface is steep, which in turn indicates that the 
mean-square estimation error is very sensitive to the choice of the 
filter coefficients. 

e Choosing the filter values w is equivalent to choosing a point in the 
w-plane. The height of the MSE surface above the plane at that point 
is determined only by the signal correlation properties. 


Problems 


5.1.1 Prove that a correlation matrix from a WSS process is positive definite. 
That is, a'Ra > 0 for any vector a. 


5.1.2 Show that if A,, 4, ... Ay denote the eigenvalues of the correlation 
matrix R,, then the eigenvalues of R* are equal to 4f, A} , = At, for any k> 0. 


5.1.3 Show that if the eigenvectors qy, qo, --- Gu correspond to distinct eigen- 
values A,, A, ... , Ay of an M x M matrix R, then the eigenvectors are 
linearly independent. 


5.1.4 Show that if the M x M correlation matrix R, has M eigenvalues, these 
eigenvalues are real and non-negative. 


5.1.5 Show that if the correlation matrix R, has M distinct eigenvalues and 
eigenvectors then the eigenvectors are orthogonal to each other. 


5.1.6 Show that an M x M correlation matrix R, can be written in the form 
Q¥R,O = A, where Q = [q q .-- Gy] is a matrix whose columns are the 
eigenvectors of the correlation matrix and A is a diagonal matrix whose 
elements are the distinct eigenvalues of the correlation matrix. 


5.1.7 Show that the trace of the correlation matrix is equal to the sum of its 
eigenvalues. 


5.2.1 Verify (5.2.2). 
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5.2.2 Find the ellipses if the following data are given: 


R aa a 28 
= 7 = ,O,= : 
X 1 3 Pax 7 d 


5.2.3 A Wiener filter is characterized by the following parameters: 


let 


o% =2. It is requested to explore the performance surface as the ratio 29/24 
varies. This is accomplished using different values of a and b. 


Hints-solutions-sug gestions 
5.1.1: 
@ReeCHj Ely a= Ella aa) = Ete xy) 20 


5.1.2: 

Repeated pre-multiplication of both sides of (5.1.1) by the matrix R, yields 
R *q = A*q. This shows that if Ais an eigenvalue of R, then A‘ is an eigenvalue | 
of R* and every eigenvector of R, is an eigenvector of RK. 


5.1.3: 
Suppose that z &q;= 0 holds for certain not all zero scalars a,’s. Repeated mul- 


tiplication of this equation and ene results of Problem 5.1.2, the following 
set of M equations are found: (on AN gq, =0,k =1,2,--.,M. This set of equa- 


tions may be written as follows: [æ q, @,q, = @,,4,,]V =9 (1). 
L oA A: sane, one 
1 A; AS ves ae 
The matrix V = l is a Vandermonde matrix, which is 


2 M-1 
ee 


nonsingular (has an inverse) if the eigenvalues are distinct. Next, we post- 
multiply (1) by V~ to obtain [a,q, oq, = &,q,,]=0. Equating the corre- 
sponding columns, we find that a@,q; = 0. Since q,’s are nonzero, the equations 
can be satisfied if the a,’s are zero. This contradicts the assumption, and thus, 
the eigenvectors are linearly independent. 


5.1.4: 

The 7 eigenvalue is found from the relation Rq; = àq; for i= 1, 2, ... M(1). 
Pre-multiplying (1) by q” (H stands for the Hermitian transpose and is 
substituted with T for real symmetric matrices) we find qR,q, = Aq” q; 
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Because the matrix is positive definite, the left-hand side of the equation 
is positive or equal to zero. The eigenvalue product on the right-hand side 
is the Euclidean norm (distance), and it is positive. Hence, the eigenvalues 
are positive or zero (non-negative). 


5.1.5: 

Two eigenvectors are orthogonal if qq, = 0 i # j. From the eigenvalue-eigen- 
vector relationship we write: R q, = 4,q, (1) R „q= = 44, (2). Multiplying (1) by 
the Hermitian form of the j eigenvector, we obtain q” Rq, =å, q; q; (3). Next 
we take the conjugate transpose of (2) and incorporate the conjugate trans- 
pose equivalence property of the correlation matrix. We, next, multiply these 
results with q; from the right to find the relation q; R, q; =À; q; q; (4). Sub- 
tracting (3) from (4) we obtain the relation (A, — A, jq” q, =0 (5). Because the 
eigenvalues are distinct, (5) implies that the vectors are orthogonal. 


5.1.6: 

We assume that the eigenvectors are orthonormal (see Problem 5.1.5). For 
each eigenvalue and the corresponding eigenvector, there is the relation- 
ship R q; =4,q,,1=1,2,---,M (1). Hence, the set of equations of (1) can be 
written in the form R Q = QA (2) because the eigenvectors are orthonormal 
we find that Q"O = I (3). Equivalently we may write (3) as follows: Q7 = 
Q! (4). If a matrix obeys condition (4) we say that the matrix is unitary. 
Multiplying (2) by Q! from the left, we obtain the relation: QĦRQ = A. This 
transformation is called the unitary similarity transformation. 


5.1.7: 


M 
tr{Q"R Q} = tr{A}= Xa, = fr{R_Q"Q} = tr{R_I}=tr{R} 
i=l 


5.2.1: 

(w — w°+ w°)'R(w — w? + w°) — 2pl(w — we + w°) ~ 2). + Og = (6+w°)! 
R(é+w°) — 2p'(E+w°) — 2Jain + 0,7. With Rw°= p, w°'p = p'w° = number and 
0,°— p'w° = J min the equation is easily proved. 





5.2.2: 
2-A 1 

JA Jee CG TAC and Js. = -p r-a- =e-48-3-1=0, 
1 3-A 

A,=1.382, 


2 14%] 16) (wi 1114 11/5 
A, = 3.618 =| |, = Jp, = 28-[6 7] = 3.6 
1 Siar L7] ta [8 8/5 


1328 0 We 
J=36+[E €] = 3.6 + 1.382 +3.618Ẹ 
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5.2.33 


1 d a|| W 
The MSE surface is given by: J = 2- 2[w, a p [w, a | ,and the 
1 ad 





“a 
4 = 
ad r d aj |1 d+a 
optimum tap weights are: =R p= = . Therefore, the 
w a d 1 1 
d+a 








1 = 
minimum MSE surface is: J _ =o; -w° p=2-[ i | ] eL 
TEUN d+a d+a 


We also have the relation 





_ eT o „d+a+1 d al) S 
Je Teg SW) ROW = 2 +16, 5j j z 


d-À a 


a d-À 
and A, = d ~ a. By incorporating different values for a and d we can create 
different ratios of the eigenvalues. The larger the ratio the more elongated 
the ellipses are. 


From the relation = 0, we obtain the two eigenvalues A, = d + a, 
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Newton and steepest-descent 
method 


6.1 One-dimensional gradient search method 


In general, we can say that adaptive algorithms are nothing but iterative 
search algorithms derived from minimizing a cost function with the true 
statistics replaced by their estimates. To study the adaptive algorithms, it is 
necessary to have a thorough understanding of the iterative algorithms and 
their convergence properties. In this chapter we discuss the steepest descent 
method and the Newton’s method. 

The one-coefficient MSE surface (line) is given by (see (5.2.7)) 


Jw) = J n +r (O\(w-w’y (6.1.1) 


and it is pictorially shown in Figure 6.1.1. The first and second derivatives are 


oJ (w) =2r(0({w-w°) a); eee) =2r (0)>0 b) (6.1.2) 
ow ow j 


Since at w = w° the first derivative is zero and the second derivative is greater 
than zero, the surface has a global minimum and is concave upward. To find 
the optimum value of w, we can use an iterative approach. We start with an 
arbitrary value w(0) and measure the slope of the curve J(w) at w(0). Next, 
we set w(1) to be equal to w(0) plus the negative of an increment proportional 
to the slope of J(w) at w(0). Proceeding with the iteration, we will eventually 
find the minimum value w’. The values w(0), w(1), ... , are known as the 
gradient estimates. 


85 





86 Adaptive filtering primer with MATLAB 


Initial guess 





Figure 6.1.1 Gradient search of one-dimensional MSE surface. 


Gradient search algorithm 


Based on the above development, the filter coefficient at iteration n, w(n), is 
found using the relation 


_ Ow) 


ow 





at | (6.1.3) 
= w(n)+ p[-V](w(n))] = w(n) — 2ur (0)(w(n)- w°) 


w(n 1)= wal 


where y is a constant to be determined. Rearranging (6.1.3), we find 
win +1)=(1— 2ur (0))w(n)+ 2ur (0)w? (6.1.4) 


The solution of the above difference equation, using the iteration approach 
(see Problem 6.1.1), is 


w(n) = w® + (1-2ur (0) (w(0)- w°) (6.1.5) 


The above equation gives w(n) explicitly at any iteration in the search pro- 
cedure. This is the solution to the gradient search algorithm. Note that if we 
had initially guessed w(0) = w, which is the optimum value, we would have 
found w(1) = w and this gives the optimum value in one step. 

To have convergence of w(n) in (6.1.5) we must impose the condition 


[1 — 2ur, (0) <1 (6.1.6) 
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The above inequality defines the range of the step-size constant u so that the 
algorithm will converge. Hence, we obtain 


-1<1-2yr (0) <1 or 0< 2ur (0) <2 or O<u oF (6.1.7) 
r 


Under this condition, (6.1.5) converges to the optimum value w as 
n— oe. If u > 1/r (0), the process is unstable and no convergence takes place. 

When the filter coefficient has a value w(n) (i.e., at iteration n), then the 
MSE surface (here a line) is (see (6.1.1)) 


I(n)= J, + 7(0)(w(n) - w yY (6.1.8) 


Substituting the quantity w(n) — w° of (6.1.5) in (6.1.8) we obtain 
Jn) = J nin + 7 (O)(t0(0) — w}? (1- 2ur (0))*" (6.1.9) 


The above two equations show that w(n)w° as n increases to infinity, and the 
MSE undergoes a geometric progression toward J,,;,. The plot of the perfor- 
mance surface /(n) vs. the iteration number n is known as the learning curve. 


Newton's method in gradient search 


Newton’s method finds the solution (zeros) to the equation f(w) = 0. From 
Figure 6.1.2 we observe that the slope at w(Q) is 


_ dfw) 


dw 


__ f(w(0)) 
w=w{0) w(0) _ w(1) (6.1.10) 





f w0) 


f(w) 
f(w) 





Tangent 


f[w(0)] 


f[w(1)] 


w (1) w (0) w 


Figure 6.1.2 Newton’s method for finding a zero of f(w). 
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where w(0) is an initial guess. The above equation is the result of retaining 
the first two terms of Taylor’s expansion and setting the rest of them equal 
to zero. This equation can be written in the form 


f(w(0)) 
f'(w(0)) 
From (6.1.11), it is clear that if we know the value w(0) and the values 


of the function and its derivative at the same point, we can find w(1). Hence, 
the n™ iteration of the above equation takes the form 


w(1) = w(0)- 





(6.1.11) 


fln) 
fwn) 


But f’(w(n)) =[f(w(n))- fwn -1)/Nwmn)-w(n-1)], and hence, (6.1.12) 
becomes 


w(n+1)= w(n)- n=0,1,2 +> (6.1.12) 


= y(n - LE MwNn) -wn = DI 
w(n+1)= w(n) Muay Fae =D) (6.1.13) 


As we have mentioned above, Newton’s method finds the roots of the 
function f(w). That is solving the polynomial f(w) = 0. However, in our case 
we need to find the minimum point of the performance surface (here line). 
This is equivalent to setting 0/(w)/dw = 0. Therefore, we substitute the deriv- 
ative 0J(w)/dw for f(w) in (6.1.12) and for f’(w) we substitute the second 
order derivative 0*](w)/dw?. Hence, (6.1.12) becomes 


dJ (w(n)) 
dw(n) 
d°(w(n)) 
ðw (n) 


w(n+1)= w(n)- n=0,1,2--- (6.1.14) 


Newton's multidimensional case 


In the previous chapter, it was found that the optimum filter is given by (see 
(4.3.5)) 


Ww = RP. (6.1.15) 
Furthermore, (4.3.1a), (4.3.3a), and (4.3.3b) can be written in the form 
dj(w) 
dW, r (0) r (1) || w, r (0) 
zA -2| — (6.1.16) 
oJ (w) r (1) 7, (O) |, w tl) 


aw, ) 
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or equivalently for M coefficients in the vector form (rax = Pax) 





V](w)=2R w-2p,, V=| 2% (6.1.17) 


The reader should refer to Appendix A for vector and matrix differentiations. 
Multiplying (6.1.17) by the inverse of the correlation matrix and using 
(6.1.15), we find 


w° =w- UR’ V](w) (6.1.18) 


where we have substituted the constant 1/2 with the step-size parameter. 
If we introduce the new gradient R'VJ (w) in (6.1.3), we obtain 


w(n +1) = w(n)- URTV] (w(n)) (6.1.19) 


This equation is a generalization of (6.1.3). Based on the observed correspon- 
dence between quantities we can substitute w’s with their vector equivalents, 
the autocorrelation r,(0) with the correlation matrix R, and 2r,(0)(w(n) — w°) 
with the gradient vector of J(w). Hence, we obtain 


w(n+1)= w(n)—2uR (w(n)- w’) = win) - LVI (w) (6.1.20) 


The presence of R, causes the eigenvalue spread problem to be exaggerated. 
This is because, if the ratio of the maximum to the minimum eigenvalues 
(i.e., eigenvalue spread) is large, the inverse of the correlation matrix may have 
large elements, and this will result in difficulties in solving equations that 
involve inverse correlation matrices. In such cases, we say that the correlation 
matrices are ill-conditioned. 

To overcome the eigenvalue spread problem, Newton’s method simply 
Substitutes u with uR in (6.1.20) to obtain (6.1.19). This substitution has the 
effect of rotating the gradient vector to the direction pointing toward the 
minimum point of the MSE surface as shown in Figure 6.1.3. 
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-R7147 (w) 


Figure 6.1.3 The gradient and Newton vectors. 


Substituting (6.1.17) in (6.1.19) we obtain 


w(n +1) = w(n)- uR? (2R w(n)-2pPa) = (1-2u)w(n)+ 2R Pa 


(6.1.21) 
= (1—- 2ujw(n)+ 2uw° 
Subtracting w° from both sides of the above equation we find 
w(n+1)-w° =(1-2u(w(n)—-w°) (6.1.22) 


For n=0 and #=1/2, we obtain w° in one step. However, in practice 
VJ(w) and R are estimated and, therefore, the value of the step-size 
parameter must be less than 0.5. 

Setting w -w° = É in (6.1.22) we find the relation 


E(n+1)=(1-2p)E(n) (6.1.23) 
which has the solution (see also Problem 6.1.1) 


E(n) =(1-—2y)"€(0) or w(n)—w® =(1- 2u) (w(0)—-w°) (6.1.24) 
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Using (5.2.7) in connection with (6.1.24) we obtain (see Problem 6.1.3) 


JO)= Lat a2)" UO) 1a) (6.1.25) 


where J(0) is the value (height) of the performance surface when w is equal 
to w(n) at iteration n. 
To obtain a decaying equivalent expression, we introduce the relation 


(1-2u)"=e* (6.1.26) 


where is the time constant. Under the condition 24<<1, we can use the approx- 
imation In(1 — 2u) = —2u. Therefore, the time constant has the value 


i 
4u 


T 


IR 


(6.1.27) 


The above equation shows that Newton’s algorithm has only one mode 
of convergence (one time constant). 


6.2 Steepest-descent algorithm 


To find the minimum value of the MSE surface, Jmn using the steepest 
descent algorithm, we proceed as follows: a) We start with the initial value 
w(0), usually using the null vector. b) At the MSE surface point that corre- 
sponds to w(0), we compute the gradient vector V/(w(0)). c) We compute the 
value —uVJ(w(0)) and add it to w(0) to obtain w(1). d) We go back to step 
b) and continue the procedure until we find the optimum value of the vector 
coefficients. 

If w(n) is the filter-coefficient vector at step n (time), then its updated 
value w(n + 1) is given by (see also (6.1.3)) 


w(1+1)= w(n)- UV] (w(n)) (6.2.1) 
The gradient vector is equal to (see (6.1.17)) 

VJ (w(n)) = —2p,, + 2R w(n) (6.2.2) 
and, hence, (6.2.1) becomes 


w(n+1)= w(n) + 2u[p, — Rwa) = w) + ulpa -R w) 


6.2.3 
=[I- w R lwn) + UPa pea 
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Figure 6.2.1 Signal-flow-graph representation of the steepest-descent algorithm. 


where we set u’ = 2y, and I is the identity matrix. The value of the primed 
step-size parameter must be much less than 1/2 for convergence. The signal- 
flow-graph of (6.2.3) is shown in Figure 6.2.1. 

To apply the method of steepest descent, we must find first estimates of 
the autocorrelation matrix R, and the cross-correlation vector Pa, from the 
data. This is necessary, since we do not have an ensemble of data to find R, 
and Pax 


Stability (convergence) of the algorithm 
Let 


én) = w(n)- w° (6.2.4) 


be the difference between the filter-coefficient vector and its optimum Wiener 
value w°. Next we write the first part of (6.2.3) in the form 


w(n+1)- w° = w(n)- w° + uw [R w°" -R w(n)] or &n+1)=[I- u R léin) 


(6.2.5) 
But since R, = QAQ' (see Table 5.1.1) and I = QQ", (6.2.5) becomes: 
E(n +1) = [I - QAQ" (n) 
or 
Q*E(n+1) =[I- wA] Eln) (6.2.6) 
or 


f'(n+1)=[1- wA] (n) 
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where the coordinate axis €’ ’s are orthogonal to ellipsoids (see Chapter 5), 
and is a diagonal matrix with its diagonal elements equal to the eigenvalues 
of R,. The k row of (6.2.6), which represents the k® natural mode of the 
steepest descent algorithm, is (see Problem 6.2.1) 


E(n+1)=[1- uA Je (1) (6.2.7) 


The above equation is a homogeneous difference equation, which has the 
following solution (see also Problem 6.1.1) 


E (n)=(1- pa YE; k=0,1,--,M=1, n=1,2,3,-- (6.2.8) 
For the above equation to converge (to be stable) as n > œ, we must set 
alei=wA<1. k=0 L2 Mel (6.2.9) 


From the right side of the inequality we obtain —p’A, <0 or p’ >0, since A, 
are real and positive. From the left side of the inequality we obtain 
wA,<2 or p’<2/A,, and hence 


O<p'<2/A, k=0,1,--,M-1 or O<p<1/A, k=0,1,---,M-1 
(6.2.10) 


Under the above conditions and as n > ©, (6.2.8) becomes lim ¢, (1) = 0 
or €=0 since Q" #0 and, thus, w(cc) = w°. Since (6.2.8) decays exponentially 
to zero, there exists a time constant that depends on the value of u’ and the 
eigenvalues of R,. Furthermore, (6.2.8) implies that immaterially of the initial 
value €(0), w(7) always converges to w° provided, of course, that (6.2.10) is 
satisfied. This is an important property of the steepest-descent algorithm. 
Since each row of (6.2.7) must decay as n > œ, it is necessary and sufficient 
that yt’ obeys the following relationship 


O<p'< — (6.2.11) 


max 


Since (1— p’A,)" decays exponentially, there exists an exponential function 
with time constant t, such that (ev Y =(1-p’,)" or 1-p’A, = pie 


eae - a --- Therefore, for small p’ and A, (large T,), we have 








WA, <<1 (6.2.12) 
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In general, the k time constant t, can be expressed in the form 


1 
= — —___—_ 6.2.1 
Ty In(1 a w,) ( 3) 


Transient behavior of MSE 


Using (5.2.7) at time n we obtain the relation 


M-1 
(wn) = Jan FET NE’ = Jain + AEE (1) (6.2.14) 
k=0 
Substituting the solution for é (n) from (6.2.8) in (6.2.14), we find the relation 
M-1 
JWA) = Jorn + >, Ag(L— WA," EO) (6.2.15) 
k=0 


It is obvious from the above equation (the factor in parenthesis is assumed 
to be less than one) that 


lim J(w(1t)) = Join (6.2.16) 


From (6.2.14) we observe that the learning curve (plot of /(w(7)) vs. n) consists 
of a sum of exponentials, each one corresponds to a natural mode of the 
algorithm. 
Solution of the vector difference equation 
If we set n = Q in (6.2.3), we obtain 
w(1) =[I- w R Jw(0)+ KPa (6.2.17) 
Similarly, if we set n = 1, we find 
w(2)= [I y’R, IWO) +N Pa (6.2.18) 
If we substitute (6.2.17) in (6.2.18) we obtain 


w(2)=[I- u R w(0)+ up + T-w’R IWP, 


1 
| (6.2.19) 
pa 7 2 7 7 
=[I- xR _] worl > [I-p R |y Pa 

j=0 
Therefore, at the n*® step we obtain 


w(n)=[I- uR |" w(0) + ye -y’R ] uo, (6.2.20) l 


Fa 
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The above equation does not provide us with a way to study the convergence 
of w(n) to w° as n > %. We must decouple the equations and describe them 
in a different coordinate system. To accomplish this task we translate and 
then rotate the coordinate system. 

After finding the eigenvalues and eigenvectors of R,, we create a diag- 
onal matrix consisting of the eigenvalues of R, and a matrix Q made up of 
the eigenvectors of R, (see Chapter 5). Since QTQ = QQ! = I, (6.2.3) takes the 


form 
w(1 +1) = Q[I- w’AJQ’w(n)+ KPa (6.2.21) 


To uncouple the weights (coefficients), we multiply both sides of the above 
equation by Q!. Hence, 


w (n+1)=[I- wA]w (n) + KP (6.2.22) 
where we define the following quantities 
w(n+1)=Q'w(n +1), w(n) =Q*w(n), pi, =Q"py., W” =Q"w° (6.2.23) 
Next, we obtain the relation 
w= Q7w° = QTR Pa = Q"(QAQ™) Pa 
= Q'QA'O'p,. = Api, 


since Q” = QT and (QAQ™)? =(AQ")"Q* =(QT)7A7Q" = QA“Q". 
The ¿t equation of the system given by (1.2.21) is 


(6.2.24) 


w(nt+1)=[1-wA,Jw'(n)+u'p,,  OSiSM-1 (6.2.25) 


By iteration, the above equation has the following solution (see Problem 
6.2.2): 


wi(n)=(1— wA, y w (0) + WPi Xe ~ uy] (6.2.26) 
j=0 


If we set æ, =1—p’A,, then (6.2.26) becomes 


n-1 
wn) =a wO +U P D al = wO) + HP 


j=0 


(6.2.27) 


meee: 
idx 
Tsg. 


n 


, ae d l-g 1 7 n 
wn) = UPa Eg l-A] (6.2.28) 





since the sum is a finite geometric series. 
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Example 6.2.1: For the development of this example we used the help of 
MATLAB. Hence, the data were found using x = randn(1,10), and the desired 
ones were found using d= conv([1 0.2],x) or d = filter([{1 0.2],1,x). The correla- 
tion matrix was found using the function R = toeplitz(xc(1,10:11)), where xc = 
xcorr(x, biased’). Similarly, we found the pdx cross-correlation. Hence, 


0.7346 0.0269 
0.0269 0.7346 


| [Q, A] = eig(R); 


f 


—0.7071 0.7071 0.7077 0 
0.7071 0.7071 


0 0.7071 


and pw’ < 2/0.7071 = 2.8285 for the solution to converge. We next choose 
fw, wl = [0 0] and hence w’(0) = Q' w(0) = 0. Therefore, (6.2.27) becomes 


, Paat lsg 1 , , it 
Wi) = Hie Tog J a OAS] 





From (6.2.23) we obtain 


: ia | ne a 
Pa. FQ Pa = = 


0.7071 0.7071 || —0.0003 0.5230 


Therefore, the system is (we set u’ = 3 for convergence) 





1 
—0.5234)[1 - (1-2 x 0.7071)” 
arr 1 — ( )"] 


w(t) = 07 





w)(n) = 0.5230{1-(1-2 x 0.7614)" ] 


0.7614 


Since w’(n) = Q'w(n) and QT = Q” we find w(n)=Q*w’(n) and at the limit 
value when n approaches infinity, the filter coefficients take the values 


—0.5234 
we -0.7071 0.7071 || 97977 | | 1.0087 
=Q'w’= = 
0.7071 0.7071 || 0.5230 —0.0373 
0.7614 


Problems 
6.1.1 Verify (6.1.5). 


6.1.2 Plot (6.1.5) for the following positive values of u: (a) 0 < 4< z (b)M= 
CETT << a with 7,(0) = 1. Identify the type of your plots. 


1 
2r (0) 
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6.1.3 Verify (6.1.25). 
6.1.4 Using Newton’s algorithm, find the third root of 8. Start with w(0) = 1.2. 


6.1.5 Let the MSE is given by the non-quadratic equation J = 2 - + 
[1 — w’)(4.5 + 3.5w)]. Find and plot the recursive Newton’s algorithm for 
the coefficient w(n), n = 0, 1, 2, ... Start with w(0) = 


6.2.1 Verify (6.2.7). 


6.2.2 Verify (6.2.26). 


. j -07 
6.2.3 Let R, -| | and P, = [0.7 0.5]' are the estimates derived 
0.7 1 
from the data. Find the vector w(n). 


6.2.4 Find an equivalent expression of J(w(7n)) = J „n +¢’(n)A€"(n) as a func- 
tion of w(7), Pa(7), and R, 


1 0.85 
6.2.5 The correlation matrix of the filter input is Ro | 
0.85 1 


eigenvalues A, = 1.85 and A, = 0.15. Plot the learning curve in a semi-log 
format and find the two time constants. Compare the results obtained from 
the graph and those given analytically (see (1.2.12)). 


6.2.6 Plotl1—w/A,,land|1-y’A_. | vs. uw’ for two “hypothetical values of 
A nin = 9.4 and Anax = 0.8. Find the optimum value Hoyi , which is given by the 


max 


intersection of the two curves. All the other values of ‘the eigenvalues create 
lines between the above two. When pl’ = uiy, the speed of convergence of 
the steepest-descent algorithm is determined by the factora=1-2u’ A 


opt” “min ` 


Hints-solutions-suggestions 
6.1.1: 


w(1) = aw(0)+ bw (a = 1— 2 ur (0), b = 2ur(0)), w(2) = aw(1) + bw? 
= a(aw(0)+ bw’) + bw? 


= a w(0)+ abw° + bw > w(n) = a"w(0)+ (a! +a? +-+ 1)bw? 





1 n 
= a” w(0) + £ bw’, 
1-a 


but 1 - a =b > w(n)=a"w(0)+ w° -a w° = w° +(w(0)-w’ Ja’. 
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6.1.2: 
Answer: (a) over-damped case. (b) critically damped. (c) under-damped case. 


6.1.3: 
n) = Join + § RE = Jmn + (1 - 20)" E (O)R, (1 — 212)" E0) 
= J nin + (1 — 2u)” E (OR E0) 
where (5.2.7) and (6.1.24) were used. But &'(0)R,&(0) = J(0) - J. and Prob- 
lem 6.1.3 is proved. | 


6.1.4: 
Hint: f(x) = x? — 8, hence x(n + 1) = x(n) — k(n) - 8]/[3x2(n)] 
6.1.5: 


w(n +1) = w(n)—[((wyew o = w(n) — LO2w Wn) + Put) 3.5 





21w(n)+9 
6.2.1: 
o(2 +1) -Oat Ay 0 OARD OAA) 
(n +1) 0 1-0 0 A, -- 0 E (1) (1— pA, X(n) 
; = SH F , 
E (n+1) 0 0.1 0 0- Aa Ei (=a a 


and the results are found since the equality of two vectors means equality 
of each corresponding element. 


6.2.2: 
Start with n = 0 = w'(1)=(1- WA, w (0)+ pp’, next set n =1= w'(2) 
=(1- BA) WA Jw O) + Wp, I+ UP, 
= (1- WA,rw'(0)+ Wp, 1+ 1-H) 
2-1 . 
=(1- wa PWO p| D -AY 
j=0 
and, therefore, at the n' iteration we obtain (1.2.26). 
6.2.3: 
Using the MATLAB function [v,d] = eig(R), we obtain the following eigen- 


values and eigenvectors: A, = 0.3000, A, = 1.7000, q, = [-0.7071 0.7071]', q, = 
[0.7071 0.7071]'. The step-size factor must be set equal to: g < 2/1.7 = 1.1765, 
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so that the solution converges. Choosing w(Q0) = 0, implies that 
w’(0)=Q'w(0)=0 = 


—0.7071 0.7071 || 0.7 —0.1414 
Pa. =Q Pa = = 
0.7071 0.7071 || 0.5 0.8485 


1 PENES a rene 
wi (n) = T (—0.1414)[1 - (1 - u’0.3Y ]; W, (0)= 17 0.8485[1 — (1 - 1.7)" ]. 
Since 


w (1) = Q'w(n), Q" = Q™, w(n) = Q' wn), we find 


bea p ge 53 COMA- (1 - 1/0.3)" 


0.7071 0.7071 |} 1 


w, (n) 75 0.8485[1 - (1 - 1.7)" 














0.7 
we (0.14141 - (1 - 0.3)" 4 oe 0.848501 — ERT aE 
Ii 03 i7 
A (-0.1414)[1 — (1 - ’0.3)"]+ e 0.8485[1 - (1 —’1.7)"] 


6.2.4: 


j(w(n)) = 0} -p3 w° + &(n)QQ*R,QQ"E(n) 


-p W° +[w"(n)- w° JR, [w(n)~ w?] 
= T o oT oTyy T o 
=0, -p W +w’? R,w(n)-w" R,w(n)-w (nR w 
+w”R w°. 
But (R w°) =w"'R' =w"R, =p} RW =P PIW 
=w°p a and, hence, 
J(w(n)) = 07- 2w" (n)p +w (nR w(n). 


6.2.5: 
Hint: The time constants are found from the two slopes of the graph. 


6.2.6: 
Hint: Hopi =2/A n +4 


max } a= CO 


Thi SA) A aA T) 


max 


chapter 7 





The least mean-square 
(LMS) algorithm 


7.1 Introduction 


In this chapter, we present the celebrated least mean-square (LMS) algorithm, 

developed by Widrow and Hoff in 1960. This algorithm is a member of 

stochastic gradient algorithms, and because of its robustness and low com- 

putational complexity, it has been used in a wide spectrum of applications. 
The LMS algorithm has the following most important properties: 


T; 


It can be used to solve the Wiener-Hopf equation without finding 
matrix inversion. Furthermore, it does not require the availability of 
the autocorrelation matrix of the filter input and the cross correlation 
between the filter input and its desired signal. 

Its form is simple as well as its implementation, yet it is capable 
of delivering high performance during the adaptation process. 

Its iterative procedure involves: a) computing the output of an FIR 
filter produced by a set of tap inputs (filter coefficients), b) generation 
of an estimated error by comparing the output of the filter to a desired 
response, and c) adjusting the tap weights (filter coefficients) based 
on the estimation error. 

The correlation term needed to find the values of the coefficients 
at the n + 1 iteration contains the stochastic product x(n)e(n) without 
the expectation operation that is present in the steepest-descent method. 
Since the expectation operation is not present, each coefficient goes 
through sharp variations (noise) during the iteration process. There- 
fore, instead of terminating at the Wiener solution, the LMS algorithm 
suffers random variation around the minimum point (optimum 
value) of the error-performance surface. 

It includes a step-size parameter, u, that must be selected properly 
to control stability and convergence speed of the algorithm. 

It is stable and robust for a variety of signal conditions. 
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7.2 Derivation of the LMS algorithm 


In Chapter 6 we developed the following relations using the steepest-descent 4 
method 
















w(n +1) = w(n)— LV] (w(n)) 


V](w(n)) =—2p,, + 2R,w(n) (7.2.2) l 





The simplest choices of estimators for R, and pa, are the instantaneous : 
estimates defined by: : 


R, =x(n)x"(n), py, =4(n)x(n) (7.2.3) 4 





Substituting the above values in (7.2.2) and then combining (7.2.1) and (7.2.2), 
we obtain 4 





w(n +1) = w(n) + 2ux(n)[d(n) — x" (n)w(n)] e 
= w(n)+ 2ux(n)ld(n)- w” (n)x(n)] (7.24) | 
= w(n) + 2p1e(n)x(n) 1 


where 
y(n) =w'(n)x(n) filter output 
e(n)=d(n)-y(n) error 
w(n)=[w,(n) w,(n)--w,,,(]" fiter taps at time n 
x(n) =[x(n) x(n-1) x(n-2)-- x(n-M+1)]? inputdata (7.2.8) ] 


The algorithm defined by (7.2.4), (7.2.5), and (7.2.6) constitute the adaptive 4 
LMS algorithm. The algorithm at each iteration requires that x(n), d(n), and 4 
w(n) are known. The LMS algorithm is a stochastic gradient algorithm if the ¢ 
input signal is a stochastic process. This results in varying the pointing 4 
direction of the coefficient vector during the iteration. An FIR adaptive filter J 
realization is shown in Figure 7.2.1. Figure 7.2.2 shows the block-diagram 4 
representation of the LMS filter. Table 7.2.1 presents the LMS algorithm. 4 


Book LMS MATLAB function 


function[w,y,e,J]=aalms(x,dn,mu,M) 
$functionl[w,y,e,J]=aalms(x,dn,mu,M) ; 
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(n) x(n — 1) _1 |x(n-M +1) 


x(n-M+1) 





2 ue(n)x(n) + wo(n) = Wo(n +1 


Wo(n)x(n) + Wy (m)x (nm — 1) 


2 e(n) = 2 pld(n)— y(n) 


Figure 7.2.1 FIR LMS filter realization. 


all quantities are real-valued; 
%x=input data to the filter; dn=desired signal; 


M=order of the filter; 


mu=step-size factor; x and dn must be of the same 


length; 

N=length (x); 

y=zeros(1,N); %initialized output of the filter; 
1 


w=zeros(1,M); %initialized filter coefficient vector; 


for n=M:N 
xl=x(n:-1:n-M+1); %gfor each n the vector x1 is 


w(n) 





w(n+ 1) 


Figure 7.2.2 Block diagram representation of the LMS algorithm. 
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% of length M with elements from x in reverse order; 
y(n) =w*x1'; 
e(n)=dn(n)-y(n); 
w=wt+2*mu*e(n) *x1; 

end; 

J=e.°2;%7 is the learning curve of the adaptation; 


The book MATLAB function aalms1.m provides the changing values of 
the filter coefficients as a function of the iteration number n. The learning 
curve is easily found using the MATLAB command: J = e.42; 


Book LMS MATLAB function to provide changing values 


function[w,y,e,J,wl]=aalmsi1 (x,dn,mu,M) 
Sfunction[w,y,e,J,wl]=aalmsi1(x,dn,mu,M) ; 

this function provides also the changes of two filter 
coefficients versus iterations; 

$all quantities are real-valued; 

$x=input data to the filter; dn=desired signal; 
S$M=order of the filter; 

Smu=step-size; x and dn must be of the same length; 
Seach column of the matrix wl contains the history 
of each filter coefficient; 

N=length(x); 

y=zeros({1,N); 


w=zeros(1,M); %Sinitialized filter coefficient vector; 
for n=M:N 
xl=x({n:-1:n-M+1); %for each n the vector xl of 


$length M is produced with elements 
from x in reverse order; 
y(n) =w*xl’; 
e(n)=dn(n)-y(n) ; 
w=wt+2*mu*e(n) *x1; 
wl (n~M+1,:)=w(1,:); 


end; 


J=e.*2;%7J is the learning curve of the adaptive 
Sorocess; each column of the matrix wl 
depicts the history of each filter coefficient; 


7.3 Examples using the LMS algorithm 


The following examples will elucidate the use of the LMS algorithm to 
different areas of engineering and will create an appreciation for the versa- 
tility of this important algorithm. 
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Table 7.2.1 The LMS Algorithm for an M"-Order FIR Adaptive Filter 


Inputs: M = filter length 
U = step-size factor 
x(n) = input data to the adaptive filter 
w(0) = initialization filter vector = 0 


Outputs: y(n) = adaptive filter output = w'(n)x(n) = d(n) 
e(n) = d(n) — y(n) = error 
w(n + 1) = w(n) + 2ue(n)x(n) 


Example 7.3.1 (Linear Prediction): We can use an adaptive LMS filter 
as a predictor as shown in Figure 7.3.1. The data x(n) were created by passing 
a zero-mean white noise v(1) through an autoregressive (AR) process described 
by the difference equation: x(n) = 0.6010x(m — 1) — 0.7225x(n — 2) + v(n). The 
LMS filter is used to predict the values of the AR filter parameters 0.6010 
and —0.7225. A two-coefficient LMS filter predicts x(n) by 


x(n) = b w. (n)x(n-1-i)= y(n) (7.3.1) 


Figure 7.3.2 shows the trajectories of w) and w, vs. the number of itera- 
tions for two different values of step-size (u = 0.02 and u = 0.005). The 





Figure 7.3.1 (a) Linear predictor LMS filter. (b) Two-coefficient adaptive filter with 
its adaptive weight-control mechanism. 
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Wo, LL = 0.005 


Wy H=0.005 
da Wp H = 0.02 


a y LT NER TORANA z "PAAS 


0 200 400 600 800 =. 1000 1200 1400 1600 1800 2000 
n 





Figure 7.3.2 Convergence of two-element LMS adaptive filter used as linear predictor. 


adaptive filter is a two-coefficient filter. The noise is white and Gaussian 
distributed. The figure shows fluctuations in the values of coefficients as 
they converge to a neighborhood of their optimum value, 0.6010 and 0.7225, 
respectively. As the step-size 4 becomes smaller, the fluctuations are not as 
large, but the convergence speed to the optimal values is slower. 


Book one-step LMS predictor MATLAB function 


function[w,y,e,J,wi]=aalmsonesteppredictor (x,mu,M) 
$function[w,J,wl]=aalmsonesteppredictor(x,mu,M) ; 
$x=data=Signal plus noise;mu=step size factor;M=number 
of filter 
scoefficients;wl is a matrix and each column is the 
Shistory of each 
filter coefficient versus time n; 
N=length (x); 
y=zeros(1,N); 
w=zeros(1,M); 
for n=M:N-1 
x1l=x (n:-1:n-M+1); 
y(n)=w*x1'; 
e(n)=x (n+l) -y 
w=w+2*mu*e (n) 
wi (n-M+1,:)=w 


(n 

xx 

hel ae 

end; 

J=e.^2; 

%J is the learning curve of the adaptive process; 


Example 7.3.2 (Modeling): Adaptive filtering can also be used to find 
the coefficients of an unknown filter as shown in Figure 7.3.3. The data x(n) 
were created similar to those in Example 7.3.1. The desired signal is given 4 
by d(n) = x(n) — 2x(n — 1) + 4x(n — 2). If the output y(n) is approximately q 


R 
A 
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Wo + wz! + Wz? + wz > 


Figure 7.3.3 Identification of an unknown system. 





equal to d(n), it implies that the coefficients of the LMS filter are approxi- 
mately equal to those of the unknown system. Figure 7.3.4 shows the ability 
of the LMS filter to identify the unknown system. After 500 iterations, the 
system is practically identified. For this example we used p = 0.01. It is 
observed that the fourth coefficient is zero, as it should be, since the system 
to be identified has only three coefficients and the rest are zero. 


Example 7.3.3 (Noise Cancellation): A noise cancellation scheme is 
shown in Figure 7.3.5. In this example, we introduce the following values: 
H,(z) = 1 (or h(n) = 6(n)), v(m) = white noise = v(n), L = 1, s(n) = sin(0.27n). 
Therefore, the input signal to the filter is x(n) = s(n — 1) + v(m — 1) and the 
desired signal is d(n) = s(n) + v(n).The Book LMS MATLAB program named 
aalms was used. Figure 7.3.6 shows the signal, the signal plus noise, and the 
outputs of the filter for two different sets of coefficients: M = 4 and M = 12. 


Example 7.3.4 (Power Spectrum Approximation): If a stochastic process 
is the output of an autoregressive (AR) system when its input is a white 
noise with variance 0%, e.g., 





Figure 7.3.4 LMS adaptive filter coefficients for system modeling. 
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Figure 7.3.5 Adaptive LMS noise cancellation scheme. 


TAAA, 


-1 
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s(n) = sin(0.1 mn) 


n 


s(n) + v(n) 
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© 


Figure 7.3.6 Noise cancellation of adaptive LMS filter. 
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the power spectrum corresponding to the stochastic process is given by (see 
Kay, 1993) 


2 





: o`. 
S(e”) = Ae (7.3.3) 
where 
` iok 
j@) -jo 
A(e”) =1- Die (7.3.4) 
Equation (7.3.2) is also written in the form 
N 
v(n) = x(n) - »y a x(n- k) (7.3.5) 
k=1 


where v(n) is the non-predictable portion of the signal or the innovations of 
the AR process. Because v(n) is the non-predictable portion of x(n), it suggests 
to use an adaptive linear predictor for spectrum estimation. If the stochastic 
process is the result of an AR process, the LMS filter coefficients will be much 
closer to those of the AR system, and the two spectra will also be very close 
to each other. 

The steps that closely approximate the coefficients of (7.3.2), using an 
LMS adaptive filter, are: 





1. Use the adaptive LMS filter in the predictive mode (see Figure 7.3.7) 
2. Average the K most recent values of Ù 
3. Compute the power spectrum 


(7.3.6) 











Figure 7.3.7 LMS adaptive filter for power spectrum estimation. 
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Let an exact stochastic process be created by an AR system having poles 
at (Z,,Z,)= 0.95e’"/". To find the difference equation, which characterized the 
AR system, apply the definition of the system in the z-domain. Hence, we 
write 


output _ X(z) 
input oV (2) 
1 1 


fe Vi) 1 1.343527 +0.902527 
1-—0.95e 4z~ || 1—0.95e tz 


The above equation can be written as: 


H(z)= 


X(z)— 1.34352 'X(z) + 0.902527 X(z) = 07 V(z) 


Taking the inverse z-transform of both sides of the above equation, we obtain 
the difference equation describing the AR system given by 


x(n) = 1.3435x(n — 1) - 0.9025x(n — 2) + a2 0(n) (7.37) 
The power spectrum is given by (see (7.3.3)) 


o? 


S (e) = ——_____2—_____”’ 
1 —1,3435¢ +4 0.9025e/?0| 
o? 


v 


f ~ 1.343527 + 0.902527 


2 
z=e® 


(7.3.8) 


Figure 7.3.8 shows the true spectrum and the approximate one. We 
assumed that the desired signal was produced by the AR filter given by 
(7.3.7). The approximate spectrum was found using the following constants: 
u = 0.02, M = 3, N = 1000, avn = 3, x(n) = dn(n — 1), and o? = 0.3385. The 
function aapowerspctraav1 will average the output w over a number of times 
as desired by the reader. If we had guessed M = 2 and avn = 5, the two 
curves will be approximately equal and the filter coefficients are also approx- 
imately equal: 1.3452, —0.8551. 


Book MATLAB function to obtain power spectra 
function[wal,v]=aapowerspectraavil (al,a2,a3,mu,M,N, avn, vr) 
S6aapowerspectraavl (al,a2,a3,mu,M,N,avn,vr) ; 
wa=zeros(1,M); 
dn=zeros(1,N) ;x=zeros(1,N); 
for .K=Leavn 

for n=4:N 


v(n)=vr* (rand-.5); 
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Figure 7.3.8 Power spectrum approximation using the LMS filter. 


dn(n)=-al*dn(n-1) -a2*dn (n-2) -a3*dn(n-3)+v(n); 
x(n)=dn(n-1); 
end; 
[w]=aalms(x,dn,mu,M); 
Wa=watw; 
end; 
wal=wa/avn; 
S$this function gives averaged w's to be used for 
finding 


$the approximate spectrum of the output of an AR filter 


Sup to the third order;M=number of LMS coefficients; 
%N=length of desired signal and input signal to LMS 
sfilter; 

S$avn=number of times w's are averaged;the function is 
easily 

%modified for AR filter with more coefficients; vr=con- 
trols the 

$variance of the white noise, multiplies the quantity 
(rand-.5); 

the function up to the first end produces the output 


oF de ode 


from 
the AR filter with coefficients al,a2 and a3; 


Plotting the spectra: 


sx=freqz(var(v),[1 -[wal]],512); 
n=0:pi/512:pi-(pi/512); 

plot(n,abs(sx)); xlabel(‘Radians’);ylabel (‘ Power... 
spectrum’ }; 

%this function is useful up to three coefficients 
SAR model; 

var controls the variance of the input noise; 
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Figure 7.3.9(a) Creating an inverse system using an LMS filter. 


Example 7.3.5 (Inverse System Identification): To find the inverse of an 
unknown filter, we place the adaptive filter in series with the unknown 
system as shown in Figure 7.3.9a. The delay is needed so that the system is 
causal. Figure 7.3.9b shows a typical learning curve. In this example we used 
four coefficients FIR filter, and the input to the unknown system was a sine 
function with a white Gaussian noise. 


7.4 Performance analysis of the LMS 
algorithm 


By subtracting the Wiener filter w° (see (4.3.5)) from both sides of (7.2.4), we ` 
obtain the following equation 


w(n+1)- w° =w(n)-w° +2ue(n)x(n) (7.4.1) 


The vectors (1 +1)=w(n+1)-— w° and é(n)=w(n)-w° are known as the 
weight-error vectors, which are described on a coordinate system shifted by 


ho 





mi 





0 Rick adalah, ea aA. MAPIE A P Lek nls Pad TY ee | ee o A Fal PA A teats Curated) 
0 100 200 300 400 500 600 700 800 900 1000 
Iteration number, n 
(b) 


Figure 7.3.9(b) The learning curve of the inverse system problem. 
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we on the w plane. Therefore, (7.4.1) becomes 
(n+ 1) = O(n) + 2ue(n)x(n) (7.4.2) 
e(n) = d(n) — y(n) = d(n) — w' (n)x(n) = d(n) - x" (n)w(n) 
= d(n)- - (nyw(n)— . (n)w° +x" (n)w° G 
=d(n)-x (n)w® -x (n|w(n)-w°] 
= e°(n)~ x" (n)&(n) = e (n)-— §' (n)x(n) 
where 
e° (n)=d(n)- x" (n)w° (7.4.4) 


is the error when the filter is optimum. Substituting (7.4.3) in (7.4.2) and 
rearranging, we obtain 


E(n +1) = E(n) + 2ufe (n) — x" (n)E(n)]x(n) 
= &(n) + 2ux(n)[e°(n) — x" (nén) (7.4.5) 
= [I—2yx(n)x" (nén) + 2ue?(n)x(n) 


where I is the identity matrix and e°(n) — x" (n)&(n) is a scalar. Next, we take 
the expectation of both sides of (7.4.5) 


E{&(n + 1)} = Eff. - 2ux(n)x" (nén) + 2uE fe? (n)x(n)} 
= E{{I—2ux(n)x" (njn) 


Since e°(n) is orthogonal to all data (see Section 4.3), the last expression is 
identically zero. The expression 


(7.4.6) 


E{x(n)x" (m)&(n)} = Efx(n)x" (mHE{(n)] (7.4.7) 


is simplified by incorporating the independence assumption, which states: the 
present observation of the data (x(n), d(m)) are independent of the past obser- 
vations (x(n-1), d(n—1)), (x(n—2), d(n—2)), ... where 


x(n) =[x(n) x(n-1) x(n- 2) --- x(n-N+1)] (7.4.8) 

x(n~1) =[x(1-1) x(n 2) + x(n-N)] (7.4.9) 

Another way to justify the independence assumption is through the fol- 
lowing observation: The LMS coefficients w(n) at any given time are affected 
by the whole past history of the data (x(n — 1), d(n — 1)), (x(n — 2), d(n — 2)), ..- 


and, therefore, for smaller step-size parameter u, the past N observations of 
the data have small contribution to w() and, thus, we can say that w(n) and 


SS S—S—SLa_aQaQaa_ i 
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x(n) are weakly dependent. This observation clearly suggests the approxima- 
tion given by (7.4.7). Substituting (7.4.7) into (7.4.6) we obtain 


E{(n + 1)} = [I - 2uE{x(n)x" (EE 


(7.4.10) 
= [I- 2uR, JE{¢(n)} 


The mathematical forms of (7.4.10) and (6.2.5) of the steepest-descent 
method are identical except that the deterministic weight-error vector €(n) 
in (6.2.5) has been replaced by the average weight-error vector E{€(1)} of the 
LMS algorithm. This suggests that, on the average, the present LMS algorithm 
behaves just like the steepest descent algorithm. Like the steepest-descent 
method, the LMS algorithm is controlled by M modes of convergence, which 
are dependent on the eigenvalues of the correlation matrix R,. In particular, 
the convergence behavior of the LMS algorithm is directly related to the 
eigenvalue spread of R, and, hence, to the power spectrum of the input data 
x(n). The more flatness of the power spectrum, the higher speed of convergence 
of the LMS algorithm is attained. 


Learning Curve 


In the development below we assume the following: a) the input signal to 
LMS filter x(n) is zero-mean stationary process; b) the desired signal d(n) is 
zero-mean stationary process; c) x(n) and d(n) are jointly Gaussian-distributed 
random variables for all n; and d) at times n, the coefficients w(1) are inde- 
pendent of the input vector x(n) and the desired signal d(n). The validity of 
d) is justified for small values of u (independent assumption). Assumptions 
a) and b) simplify the analysis. Assumption c) simplifies the final results so 
that the third-order and higher moments that appear in the derivation are 
expressed in terms of the second-order moments due to their Gaussian 
distribution. 

If we take the mean-square average of the error given by (7.4.3), we 
obtain 


J(n) = Efe*(n)} = Elle’ (n) — €" (n)x(n) Ile? (n) — x" (Mn) 


(7.4.11) : 
= Efe”? (n) + EIE (n)x(n)x" (mén) — 2Ele (n) (n)x(n)} 4 


For independent random variables we have the following relations: 


Ee y SEC Ely |S EZ Ely | = ElxE ly? |x} (7.4.13) ] | 
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Based on the above two equations, the second term of (7.4.11) becomes 


(xt (n)(n) =&"(n)x(n) = number) 


ELE? (1)x(n))"} = EIE (n)x(m) x" (nén) = ELS" (n)E{x(m)x"(n)}E(n)} 
(ETR Sn) = tr{E{ $! (n)R én) 

í 

í 


E 
E 

(7.4.14) 
È 


tr{E" (MR EH = El tr{o(n) é (nR }} 
tr{E{E(n) E (n)}R,} = er K()R,} 


where we used the following properties: a) the trace of a scalar is the scalar 
itself, b) the trace, tr, and the expectation, E, operators are linear and can be 
exchanged, c) the trace of two matrices having M x N and N x M dimensions, 
respectively, is given by 


tr{ AB} = tr{BA} (7.4.15) 

The third term in (7.4.11), due to the independence assumption and due to 
the fact that e°(m) is a constant, becomes 

E{e®(n)&"(n)x(n)} = E{S" (12)x(n)e°(n)} = Eg" (n)}E{x(n)e°(n)}=0 (7.4.16) 


The second term is equal to zero due to the orthogonality property (see 
Section 4.3). 
Substituting (7.4.16) and (7.4.15) in (7.4.11) we obtain 


](n) = Efe? (n)} = Jan + PIK(A)R, } (7.4.17) 


where J... = E{(e° (n) }is the minimum mean-square error. However, 
R = QAQ", where Q is the eigenvector matrix and A is the diagonal eigenvalue 
one (see Section 5.1). Hence, (7.4.17) becomes 


(n) = Jmn + PAK (MQAQ™} = Jenin + LQ" K(1)QA} 


= a + tr{E{(Q*E(n) fee A} (7.4.18) 
oa FEELS) ET (nA min TEK (n)A} 
where 
K'(n)= E{¢'(n) č” (n)} (7.4.19) 


Recall that €’(n) is the weight-error vector in the coordinate system 
defined by the basis vectors, which are specified by the eigenvectors of R. 
Since A is diagonal, (7.4.18) becomes 


M-1 
Kn) = Joon + Dae: (1) = Jin + X AEI? (n) (7.4.20) 


where x;(n) is the ij} element of the matrix K’(n). 
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The learning curve can be obtained by any one of the equations: (7.4.17), : 
(7.4.18), or (7.4.20). It turns out that, on the average, the learning curve above ` 


is similar to the one given by the steepest-descent algorithm. 


The general solution of &’(1) is given by (6.2.15). Hence, (7.4.20) becomes : 


I) = Jan + Y A41- 2HA, PELE, (0) (7.4.21) 


Example 7.4.1: The filter H,(z), which produces the desired signal d(n) 
= x(n), is represented by the difference equation x(n) + a,x(n — 1) + a.x(n — 2) ° 


= v(n), where a, and a, are the system coefficients, and v(n) is a zero-mean ` 


white noise process of variance 07. To obtain a, and a, we use the adaptive 
predictor. The LMS algorithm for this case is given by (w = 2u) 


w(n) = w(n)+ w x(n- De(n) (7.4.22) 
where 
e(n) = d(n) — w" (n)x(n —1) = x(n) — w" (n)x(n—-1) (7.4.23) 


The learning curves, /(m) = E{e?(n)}, are shown in Figure 7.4.1 using the 
following constants: M = 2, a, = —0.96, a, = 0.2, oO. = 0.33, x = 0.04, and p’ = 
0.004. The curves were averaged over 120 runs. The following MATLAB 
function was used: 


Book MATLAB function for Example 7.4.1 


function[ems]=aaexample741 (mu,M, an) 
%function[ems]=aaexample741(mu,M) ; 
this function produces figures like in example 7.4.1; 







M = 2, oœ? = 0.33 


Mean-squared error 





200 250 300 350 400 450 500 


Number of iterations 


Figure 7.4.1 Learning curves for two different step-size parameter values. 
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gmM=number of filter coefficients;an=number of times the 
error to be averaged; 
eav=zeros(1,1000); 
dn (1)=0;dn(2)=0;x(1)=0;x(2)=0; 
for k=l:an 
for n=321000 
dn(n)=0.96*dn(n-1)-0.2*dn(n-2)+2* (rand-0.5); 
x(n)=dn(n-1); 
end; 
[w,y,e]=aalms(x,dn,mu,M); 
eav=eav+e.^2; 
end; 
ems=eav/an; 


The coefficient-error or weighted-error correlation matrix 


Since the mean-square error (MSE) is related to weight-error correlation 
matrix K(n), the matrix is closely related to the stability of the LMS algorithm. 
Therefore, /(n) is bounded if the elements of K(n) are also bounded. Since 
K(n) and K '(n) are related, the stability can be studied by using either one. 

Multiply both sides of (7.4.5) by QT and use the definitions &(n) = Q*&(n) 
and x’(n) = Q'x(n) to obtain (see Problem 7.4.1) 


E’(n+1)=(I-2pux’(n)x’" (né (n) + 2’ (n)x’(n) (7.4.24) 


Next multiply (7.4.24) by its own transpose, and take the ensemble average 
of both sides to obtain (see Problem 7.4.2) 
K’(n+ 1) =K’(n)— 2uE{x’"(n)x”* (DE (n) E (n)] 

— 2uE{E' (n) (n)x’(n)x"" (n)} 

+4 Elx (nx (mE mn) 6" (nx (nx (n) 

+ 2yEe* (n)x’(n)& (n) 

+ 2uE(e’(n)&'(n)x’" (n)} 

-4° Efe" (n)x’(m)§"" (n)x’(n)x”™ (n) 


(7.4.25) 


— 4p Ele? (n)x’(n)x"™ (nE "(mx (n) 


+4 Eje” (nx (nx (n)} 


Based on the previous independent assumption, we note the following: 
(1) (n) is independent of the data x(n) and the desired signal d(n), which is 
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also true for the transformed variables x’(m) ; 2) ¢’/(m) is independent of x(n) 
and d(n); 3) d(n) and x’(n) are zero-mean and are jointly Gaussian since d(n)  ; 
and x(n) have the same properties; 4) applying the orthogonality relationship, 
we obtain Ef{e°(n)x’(n)} = Efe’(n)Q'x(n)} = Q*E{e?(n)x(n)} = Q'0=0 ;5) e(n) 
depends only on d(n) and x(n); 6) from 4) e(n) and x’(n) are uncorrelated 
(Gaussian variables are also independent); 7) e°(1) has zero mean. With the 
above assumptions in mind, the factors of (7.4.25) become as follows: 


E{x’(n)x™ (né (n) E (n) = Efx’ x? (nHE(E’(n) E (n)} 
= E{Q*x(n)x! (n)QIK (n) = Q'R,QK'(n) = AK’(n) 


(7.4.26) 


Ef (n) E (nx (n)x? (n)} = K'(n)A (7.4.27) 


E{x’(n)x’? (MENE (n)x’(n)x’(n)} = 2AK’(n)A + tr{AK'(n)A (7.4.28) 
(see Problem 7.4.3). 


Because e°(n), x’(n), and &’(m) are mutually independent, and E{e°(n)} = 0 Í 
then (7.4.29) to (7.4.32) are true. 


Ele (nx (ME (n)} = Ele (nJElx (éE (n)}=0 (MxM matrix) (7.4.29) 


Ele (mE lnx T (n)} = 0 (7.4.30) 
Ele (nx mE" (n)x’(n)x’" (n)} = 0 (7.4.31) 
E{e°(n)x’(n)x’! mE mx T (n)} = 0 (7.4.32) 


Efe”? (nx (nx T (n)} = Ele Elx (mx T n} = J A (7.4.33) 


min 


Substituting (7.4.26)-(7.4.33) in (7.4.25) we find 


K’(n+1) = K’(n)—2 u(AK’(n)+ K’(n)A) + 8u? AK’(n)A 


(7.4.34) 
+4u7tr{AK’(nJJA+ 47]. A 


min 


Concentrating on the iit component of both sides of (7.4.34) we find (see 
Problem 7.4.4) 


M-~1 


k. (n+1)= (1— 4u, + 8u2A2)k, (n) 4 4A, > Ak’ (n)+ 4u"J,.,,4; (7.4.35) 
j=0 


Since K'(n)is a correlation matrix, k’? < k? (n)k’, (n)for all values of i and j, 
and since the update of k; (n) is independent of the off-diagonal elements 
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of K’(n), the convergence of the diagonal elements is sufficient to secure the 
convergence of all the elements and, thus, guarantees the stability of the 
LMS algorithm. Therefore, we concentrate on (7.4.35) withi=0,1,2,...,M-—1. 

Equation (7.4.35) can be written in the following matrix form (see 
Problem 7.4.4) 


k’(1 +1) = Fkn) +4] anA (7.4.36) 
where 
k(n)=[ki(n) kao) e kaama (7.4.37) 
Mie Ae A m Anal (7.4.38) 
F =diag[ f, fi + fy J+ 4u’ A" (7.4.39) 
f,=1-4uA,+ 8p" A? (7.4.40) 


It has been found that if the eigenvalues of F are all less than one, the LMS 
algorithm is stable or equivalently the elements k'(n) remain bounded. An 
indirect approach to obtain stability is given below. 


Excess mean-square error and misadjustment 


According to (7.4.17), we may write the expression of the difference between 
the mean-squared error and the minimum mean-square error as follows: 


JAM =] -J nin = PKR} (7.4.41) 


The steady state form of (7.4.41) is 
J f°) = Je) — Janin = EK (%)R, 3 (7.4.42) 


and it is known as excess MSE. Equation (7.4.20) gives another equivalent 
form, which is 


M-1 
Jl) = X Ak; cc) = E! k’ (o0) (7.4.43) 
i=0 
As n > œ, we set k’(n+1)= k’(n) and, hence, (7.4.36) becomes 


k’(oo) = 4u7J_. (I-F)'A (7.4.44) 
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As a result, (7.4.43) becomes 
J.(0°) =4u°J_ A (I- Fy" A (7.4.45) 
which indicates that J. is proportional to J,,;,. The normalized J,,(ce) is equal to 


bie) 
J 


min 


M= =4u°A'(I-F)'A (7.4.46) 


which is known as the misadjustment factor. 
If A(N x N), B(M x M), and C(N x M) are matrices that have inverses, then 


(A+CBC')1=A7-A'C(B1+C’A'C)'C'AT (7.4.47) 
But 
1- f, 0 0 
0 1-f, -- 0 
I-F=| =4u°AA' =F +a" (7.4.48) 
0 0 me ieee re 


where a = —4u’. Therefore, (7.4.46) takes the form (see Problem 7.4.5) 











FCAATE" 
Me Ea Sal ES 1 
: 1  1+aA FA 
yw (7.4.49) 
HA „4. 
aFOAATE pS 1-24, 
Sg AE 44 1 —_ i0 
L ep ETA 





z M-1 
Ey a 
1-2 j À; 
i=0 


where in (7.4.47) we set C =À, B=I, g’ =~a, and A = F . Small M implies that 
the summation on the numerator is small. If, in addition, 2u A, << 1, we 
obtain the following result 


M-1 ua M-1 
ts A. = utr{R 7.4.50 
2o ud = utr{R,} ( ) 





Hence, (7.4.49) becomes 


utriR } 
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In addition, for M << 0.1 the quantity wir{R,} is small and (7.4.51) becomes 
Me= utr{R,} (7.4.52) 


Since r,(0) is the mean-squared value of the input signal to an M-coefficient 
filter, we write 


M= uMr,(0) = uME{x*(n)} = uM(Power input) (7.4.53) 


The above equation indicates that to keep the misadjustment factor small 
and at a specific desired value as the signal power changes, we must adjust 
the value of u. 


Stability 
If we set 
Le ye (7.4.54) 
fod 1- 2U A, 
then 
M= aa (7.4.55) 


We observe that L and M are increasing functions of u and L, respectively 
(see Problem 7.4.6). Since L reaches 1 as M goes to infinity, this indicates 
that there is a value {ilna that u cannot surpass. To find the upper value of 
u we must concentrate on an expression that can easily be measured in 
practice. Therefore, from (7.4.55) we must have 


M-1 


ra 
eect eee (7.4.56) 
a 1- 2U À; 


The above equation indicates that the maximum value of u must make 
equation (7.4.56) an equality. It can be shown that the following inequality 
holds. 





M-1 
TE D 
Y Sse (7.4.57) 
ot ] -24 A 


{= 


ae 1-24% A, 
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Hence, if we solve the equality 





i=0 =1 (7.4.58) 


for u, then (7.4.56) is satisfied. The solution of the above equality is 











1 1 1 
= = = 7.4.59 
Bmax MI 3tr{R_} 3(Input Power) ( ) 

3% À i 
i=0 
and, hence, 

< U< 7.4.60 
H 3tr{R_) l ) 


LMS and the steepest-descent method 


The following similarities and differences exist between the two methods: 


1. The steepest-descent method reaches the minimum MSE Jmin 
as n — œ and w(n)> w°. 

2. The LMS method produces an error J(cc) that approaches Jmin 
as n — œ and remains larger than Jmin- 

3. The LMS method produces a w(n), as the iterations n — œ, that is 
close to the optimum w°. 

4. The steepest-descent method has a well-defined learning curve con- 
sisting of a sum of decaying exponentials. 

5. The LMS learning curve is a sum of noisy decaying exponentials, 
and the noise, in general, decreases the small values the step-size 
parameter u takes. 

6. In the steepest-descent method, the correlation matrix R, of the data 
x(n) and the cross-correlation vector pa,(”) are found using ensemble 
averaging operations from the realizations of the data x(n) and de- 
sired signal d(n). 

7. In the LMS filter, an ensemble of learning curves is found under 
identical filter parameters and then averaged point by point. 


Example 7.4.2 (Channel Equalization): Figure 7.4.2 shows a base-band 
data transmission system equipped with an adaptive channel equalizer and 
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Channel 
H(z) 


v(n) 







Equalizer 
W(z) 


d(n) 







Training 
sequence 
generator 


(a) 









s(n — L) = d(n) 








Random independent 
binary (+ or —1) 
generator 


Adaptive 
LMS filter (+) em 
W(z) w 


Figure 7.4.2 (a) Base-band data transmission system equipped with an adaptive 
channel equalizer and (b) a training system. 


Channel 
H(z) 


(b) 


a training system. The signal s(n) transmitted through the communication 
channel is amplitude or phase modulated pulses. The communication chan- 
nel distorts the signal (the most important one is the pulse spreading) and 
results in overlapping of pulses and, thus, creating the intersymbol interference 
phenomenon. The noise v(n) further deteriorates the fidelity of the signal. It 
is ideally required that the output of the equalizer is the signal s(n). Therefore, 
an initialization period is used during which the transmitter sends a 
sequence of training symbols that are known to the receiver (training mode). 
This approach is satisfactory if the channel does not change characteristics 
rapidly in time. However, for slow changes the output from the channel can 
be treated as the desired signal for further adaptation of the equalizer so 
that its variations can be followed (decision directed mode). 

If the equalization filter is the inverse of the channel filter, W(z) = 1/ 
H(z), the output will be that of the input to the channel, assuming, of 
course, that noise is small. To avoid singularities from the zero of the 
channel transfer function inside the unit circle, we select an equalizer such 
that W(z)H(z) =z“. This indicates that the output of the filter W(z) is that of 
the input to the channel shifted by L units of time. Sometimes, more general 
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filters of the form Y(z)/H(z) are used where Y(z) #z “. These systems are 
known as the partial-response signaling systems. In these cases, Y(z) is 
selected such that the amplitude spectra are about equal over the range of 
frequencies of interest. The result of this choice is that W(z) has a magnitude 
response of about one, thereby minimizing the noise enhancement. 

Figure 7.4.2b shows a channel equalization problem at training stage. 
The channel noise v(n) is assumed to be white Gaussian with variance O°. 
The equalizer is an M-tap FIR filter and the desired output is assumed 
to be delayed replica of the signal s(n), s(n — L). The signal s(n) is white, 
has variance o? = 1, has zero mean value, and is uncorrelated with v(n). 
The channel transfer function was assumed to take the following FIR 
forms: 


A(z) = H (z) = 0.34 + z!—0.34z7 


(7.4.61) 
H(z) = H,(z) = 0.34 + 0.827 + 0.127 

The solution of the two systems above, were selected based on the 

eigenvalue spread of their output correlation matrix. Figure 7.4.3a shows the 

learning curve with a channel system of the form H(z) = 0.34 + z~t — 0.342 


J averaged 20 time: 





J averaged 20 times 
> > > > 
NO e fon) ioe) — 


(æ) 





Figure 7.4.3 Learning curves of the channel system. 
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and two different step-size parameters and c = 0.5. The variance of the noise 
was o% = 0.043. Figure 7.4.3b shows the learning curve with channel system 
of the form H(z) = 0.34 + 0.871 + 0.1z * and with the same step-size parameter 
and noise variance as in case (a) above. In both cases, the curves were 
reproduced 20 times and averaged. The delay L was assumed to be 3, and 
the number of filter coefficients were 12. The curves were produced using 
the following MATLAB function: 


Book MATLAB function for channel equalization 


function [Jav,wav,dn,e,x]=aaequalizer(av,M,L,h,N,mu,c) 
function [Jav,wav,dn,e,x]=aaequaliz- 
ger(av,M,L,h,N,mu,c) 
‘this function solves the example depicted 
gin Fig. 7.4.2;av=number of times to average e(error or 
learning curve)and w(filter coefficient) ;N=length 
of signal s;L=shift of the signal s to become 
sdn; h=assumed 
impulse response of the channel system;mu=step fac- 
6tCOr}; 
$M=number of adaptive filter coefficients; c=constant 
multiplier; 
wl=[zeros(1,M)]; 
J={zeros(1,N)1; 
for i1=l:av 
for n=1:N 

Vin) =¢* randn: 

s(n)=rand-.5; 

if s(n) <=0 

s(n)=-1; 
else 


end; 
dn=[zeros(1,L) s(:,1:N-L)]; 
ych=filter(h,1,s); 
x=ych(1,1:N)+4v; 
[w,yv,e]=alms(x,dn,mu,M); 
wl=wl+w; 
J=J+e.*2; 

end; 

Jav=J/av; 

wav=wl/av; 


It is recommended that the reader changes the input values to observe 
the effect on the learning curve. 
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7.5 Complex representation of LMS algorithm 


In some practical applications it is mathematically attractable to have com- 
plex representation forms of the underlying signals. For example, base-band 
signals in quadrature amplitude modulation (QAM) format are written as a 
summation of two components: real in-phase component and imaginary 
quadrature component. Furthermore, signals detected by a set of antennas 
are also written in their complex form for easy mathematical manipulation. 
For this reason, we shall present in this section the most rudimentary deri- 
vation of the LMS filter assuming that the signals are complex. 

In the case where complex-type signals must be processed, we write the 
output of the adaptive filter in the form 


y(n) = y w,(n)x(n— k) (7.5.1) 
and the error is given by 
e(n) = d(n)— y(n) (73:2) 
Therefore, the MSE is 
J = Efe(n)e * (n)} = E{le(n)?} (7.5.3) 


Let us define the complex filter coefficient as follows: 
w, = a,(n)+ jb (n) k=0,1,---,-M-1 (7.5.4) 
Then the gradient operator V has the following k® element: 


d . oO 
* a(n) ` ob (n) 





V, AV k=0,1,---,M-1 (7.5.5) 
which will produce the following k element of the multi-element gradient 
vector VJ: 


oJ . oJ 


aoa (7.5.6) 








Na 


It is noted that the gradient operator is always used to find the minimum 
points of a function. The above equation indicates that a complex constraint 
must be converted to a pair of real constraints. Hence, we set 


oJ 


oJ _ 
ab, (n) i 


a” 








j (7.5.7) 
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The kt element of the gradient vector, using (7.5.3), is 


a ve l; ,_%æ(n) = ) 
ae 7.5.8 

















V J=Ef 
Taking into consideration (7.5.1) and (7.5.2), we obtain 


æn) _ od(n) _ Sr. Sa ola, (0- 7b, (n)] jb] 





== -k) (7.5.9 
un) an) + aa, (n) 2. i, (n) ope EK) (7.5.9) 
which can be written as 
eM) ey 
aa, (1) x(n —k) (7.5.10) 
oe*(n) _ ; BO). oie pe Oe cer 
Hoy ae (18), gy TOO (75.1) 


Introducing (7.5.10) and (7.5.11) into (7.5.8), we find the relationship 
VJ = Va JW) = -2E{x(n— ke *(n)} (7.5.12) 
and, thus, the gradient vector becomes 
VJ (w(n)) = -2E{e * (n)[x(n) x(n-1) + x(n-—M4+1)]"} (7.5.13) 


If w(n) is the filter-coefficient vector at step n (time), then its update 
value w(n + 1) is given by (see (6.1.3)) 


w(t 1) = w(n)- u V J (win) (7.5.14) 


Next, we replace the ensemble average in (7.5.13) by the instantaneous 
estimates e*(n)x(n) to obtain 


w(11+ 1) = w(n) + 2ue * (n)x(n) (7.5.15) 
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Table 7.5.1 Complex Form of the LMS Algorithm 


Parameters: M = number of filter coefficients 
u = step — size factor 
x(n) = [x(n) x(n - 1)---x(1 - M + 1)"] 
Initialization: w=0 
Computation: For n =Q, 1, 2, ... 
1. y(n) = wH(n)x(n) H: Hermitian = conjugate transpose 
2. e(n) = d(n) — y(n) 
3. w(n + 1) = w(n) + 2He*(1)x(n) 


which is the LMS recursion formula when we are involved with complex- 
valued processes. For complex signal it has been shown that the misadjust- 
ment factor M is given by 


M-1 u A, 
1-pa 
M=—— pt (7.5.16) 
A 
1- 2i 
m lH A, 
provided, also, that 
1 : 
O<u< i (7.5.17) 2 
and 
M-1 , 
Migi (7.5.18) 3 
dips 
k=0 pea 


The LMS algorithm for complex signals is shown in Table 7.5.1. 


Book MATLAB function for complex LMS algorithm 


function[w,y,e,J,wl]=aacomplexnims (x,dn,mubar,M,c) 
$function[w,y,e,J,wl]=aacomplexnims (x,dn,mubar,M,c) 
$x=input data to the filter;dn=desired signal; 
$M=filter order;c=constant;mubar=step-size equivalent 
$parameter; 

%x and dn must be of the same length;J=learning curve; 
N=length(x) ; 

y=zeros(1,N); 

w=zeros(1,M);%initialized filter coefficient vector; 
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1.5 
3 1 
x< 
o 
20.5 
0 
0 10 20 30 40 50 60 70 80 90 100 
n 
1 
Z 0 
= 
-2 
0 10 20 30 40 50 60 70 80 90 100 


abs [J (n)] 





Figure 7.5.1 Results of Example 7.5.1 using the LMS complex-valued form. 


for n=M:N 
xl=x(n:-1:n-M4+1);%for each n vector xl is of length 
$M with elements from x in reverse order; 
y(n)=conj (w) *x1'; 
e(n)=dn(n)-y(n); 
w=w+ (mubar/ (c+conj (x1) *x1'))*conj (e(n)) *x1; 
wi (n-M+1,:)=w(1,:); 
end; 
Jee 2s 
the columns of the matrix wl depict the history of the 
filter coefficients; 


Example 7.5.1: With the input signal x(n) = sin(0.27n) + j1.5(rand—0.5), the 
desired signal d(n) = sin(0.2 zn), u = 0.01 and number of coefficients M = 16, 
we obtain the results shown in Figure 7.5.1. 


Problems 
7.2.1 Develop the LMS algorithm for complex-valued functions. 


7.3.1 If an AR system has poles at 0.85e™™* with input of white noise v(n), 
find its discrete-time representation. 
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7.4.1 Verify (7.4.24). 
7.4.2 Verify (7.4.25). 
7.4.3 Verify (7.4.28). 


7.4.4 Verify (7.4.36). 
7.4.5 Verify (7.4.49). 


7.4.6 Verify that (7.4.54) and (7.4.55) are increasing functions of u and £, 
respectively. 


7.4.7 Use the average eigenvalue to obtain the average time constant for the 
LMS algorithms and state your observations for M. e 


7.4.8 Discuss the effect of the initial value w(0) on the transient behavior of 
the LMS algorithm. TE 


7.4.9 Develop the LMS algorithm using the steepest-descent method and 
setting J(n) = e(n) instead of Efe?(n)}. 


7.4.10 (a) Let the impulse response of a FIR channel (see Figure 7.4.2b) have 
the values h = [0.22 1 0.22]! and n = 1, 2, 3 and zero otherwise. The random 
binary generator produces a Bernoulli sequence with s(n) = +1 having zero 
mean a unit variance. The sequence v(n) is white with zero mean and vari- 
ance oł = 0.01. The equalizer is FIR filter with 5 coefficients. Find the corre- 
lation matrix R, and the eigenvalues. Next find the step-size parameter based 
on the spread of the eigenvalues. b) Repeat part a) but with h = [0.39 10.39]. 


Hints-solutions-suggestions 

Tezh: 

R =x(n)x" (n), Pia =X(n)d * (n), VJ =-2x(n)d * (n) + 2x(n)x™ (n)w(n) , 
w(n+1)= w(n)+ 2ux(n)[d * (n) — x" (njw), y(n) = w" (n)x(n) = x" (n)w(n) 
e(n) = d(n) — y(n), w(n +1) = w(n) + 2ux(n)e * (n), H = conjugate transpose (or 
Hermitian). 


La: 
H(z) =X(z)/V(z)=1/[( —0.85ei7/4z1)(1 — 0.85e77/4z-1)] or V(z) = X(2[1 -1.7 2 zo 


+(0.85)°z *]. Hence, the inverse Z-transform gives v(n) = x(n) — 1.202x(n — 1) 
+ 0.7225x(n — 2). 
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7.4.1: 
Q'EN +1) = Q'EN) — 2ux(n)x" (n)E(n)] + 2ye (mxn) = QTE(n) 
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— 2HO'x(n)x" (nQ Q'EN] + 2u e (nx (n) = (n) — 2uQ* x(n(Q* x(n)’ én) 


+ 2ue°(n)x’(n) = [I 2ux’(n)x’* (nlé (n) + 2ye’ (n)x'(n). 
Note:Q°" = (Q?) =(Q")' =Q. 


7.4.2: 
Hint: § &'(n+1)=&' (n)—2uE" (n)x’(n)x’" (n) + 2ue (n)x’" (n). 


7.4.3: 


Because x(n) and &(n) are independent Efx’'(n)x (né (né x'(n)x’ T (n)} 


= Efx'(n)x’ "(nE (mé (nx (nx T (n) = Elx’ (nx (MK (nx (n)x i) (1) 


x’! (n)K’(n)x’(n) = x (n)x; (n)k;, (n) 


C(n) = M x M matrix = x’(1)x’" (n)K’(n)x’(n)x’" (n) 


Cin (n) = X; (11), (m) x! (n)x/(n)k; (n) 
i=0 j=0 
Efc, (n) -SY etx’ (n)x’, ()x/(n)x/(n))ki (1) 
i=0 j=0 


For Gaussian random variables we have 


Blo SEI LE EER EA a Ele EN 


Elx(n)x’(n)} =A eli- j) 


et 


(2) 


(3) 


(5) 


(6) 
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The above equation is due to the relation E{Q'x(n)x'(n)Q}=Q'RQ=A, 
Rq, = 4q 


Efx (mn)x, (nx (nx (n) = 44;ôl - m)o(i — 7)+A,A,,6(1 - 1)d(m — j) 


(8) 
+å, ôl- j)5(m - i) 
Substitute (8) in (5) to find 
M-1 M-1 M-1 M-1 
E{c, (n) = =>) AA, 5(I — m) (i — jk, (n) + A, A, 8l- i) 6m — j)ki(n) 

i=0 j=0 i=0 j=0 

M-1 M-1 

n A, A,, 8l- j) Sn- ik (n) = 2, 6 ‘my Aki(n) +A, À ki (n) 
i=0 j=0 


+À, À k(n) forl=0,1,...,M-1andm=0,1,...,M-—1. But k; (n) = k; (0) and 


m mi 


in 


Sako tr{AK’(n)} and A, Ap Kin 0) + A, Ap K pi) = 2A, A,,k;, (1). Based on 


se results, (7.4.28) is apparent. 


7.4.4: 
; ; 2,2 2 2 
koo +1) komt) fo +4u Ao 4u Ay Ag -4u À M-120 
; , Me D e see 
kot) k(t) i Aghy Jatu A AUT AnA 
kM Mat) Ai aA fee a +4u2A2 
T 0^M-1 “1*M-1 M-1 M 
A, 
ki (nt1) ki (nt) = 
A, 0 
e| ki (ntl) ki (ntl) +4u’ T 
kf na 4+1) 
M-1,M-1 Anes 


where the ii" component of both sides is 
k; (n+l) = fk; (n) + Ape a [Aki (1m) + AW AK, () t+ AWA KO) 
+47] A, 


min” 7 


which is identical to (7.4.36). 
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7.4.5: 
The a’ in (7.4.49) has been substituted with a in the development below. 





























t AB A Se AE T E aH 
“i 1- fo TEF d= toy (l- fY ae G-A- f) d= h) 


oA BA vt A) ff 
A-A-A -A-A -AFS fp 1-f, 




















A? À? 2-1 | 
a 0 + 1 ` uA, 
= l-fy If, _ __is0 1 2A, 
A? A? 2-1 | 
1-(1-4y2, +84) 1-(1- 4a, +8u?A?) 41-2, 
which confirms (7.4.49) for M = 2. 


7.4.6: 
Hint: Take derivatives with respect to u and L, respectively, and note the 


positive values of = z and = T 





7.4.7: 
16., 1 i 
A =— J A, =—#(R,)7... = see (6.2.12)) and, 
” M 22 M” Rl eae Eag | )) 


therefore, M=uMA =M/(27, Bas Observations: (1) M increases linearly 
with filter length for a fixed A *(2) the setting time of the LMS algorithm is 


proportional to Tse a and, hence, M is inversely proportional to the settling 
time, (3) M is proportional to u and inversely proportional to Tes but U 
and T are inversely proportional to each other. Careful consideration 


mse ,dv 


must be given when we are considering values for M, u and T 


mSe ,a0" 
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7.4.8: 
The averaged learning curve of the LMS algorithm is close to the steep- 
est-descent method. Hence, we write (6.2.15) in the form: 


M-1 
Jn) = Jmn + X All- 2H Ag)" E? (0) (1) 


If we set 
E(0) = w(n)— w® =0-w°, then &(n) = Q™E0) = -O'w° = -w° (2) 


Hence, (1) becomes 


J(1) Jan + J Ay (1 HA) we? (3) 
k=0 


Equation (3) indicates that if w?”’s corresponding to the smaller eigenvalues 
of R, are all close to zero, then the transient behavior of the LMS algorithm 
is determined by the larger eigenvalues whose associated time constants are 
small, and thus, a fast convergence takes place. If, however, the we’s corre- 
sponding to the smaller eigenvalues of the correlation matrix are significantly 
large, then we will observe that the slower modes will dominate. 


7.4.9: 
From (6.2.1) we find 


w(H+ 1) = w(n) — u Ve’ (n) (1) 


where n is the iteration number, and 





v=o Mise a , w(n)=[w,(n) w,(n) = Wy N. 


But 
de? (n)/ðw; = 2e(njæ(ny dw, = 2e(n)d[d(n) — y(n) ow, = —2e(n)dy(n)/dw, = -2e(n) 


g Vu (1)x(n — i) | /dw,(n) = -~2e(n)x(n—i) and hence, 
i=0 


Ve? (n) = —2e(n)x(n) 
where 
x(n) =[x(n) x(n—1) x(n-2) + xn-M4 DJ’. 
Therefore, (1) becomes 


w(n+1) = w(n) + 2ue(1)x(n) 
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7.4.10: 
3 
x(n) = ` h(k)s(n-k)+v(n) and hence, we obtain 


k=1 


E{x(n)x(n— m)} = r,(m) = 


ELS Asn- k) + OY” KOS- m- 1) + O(n m) 
k=1 k=1 


= 5> h(k)h(i)E{s(n— k) 
s(n-m-i)}+ p h(i)E{v(n)s(n — m- iJ) 


+> h(k)E{s(n — k)o(n — m)} + Efo(n)o(n — m) 
k 


= ` nk) > h(i)o? 5(m + i—k) + 026(m) = o; X hhk - m)+ o?ô(m) 
: : k=l 


= ofr, (m) + 075(m) 
since s(n) and v(m) are uncorrelated with zero mean value. Hence, 


r,(0) = 7,(0) + 0; = SRR) +0, = k (1) +h? (2)+ h’ (3) +02 = 1.168 


k=1 


r()=7,(1) = Ss h(k)h(k — 1) = h(2)h(1) + h(3)h(2) = 0.44, 1,(2) = 7,(2) = 0.0484 


k=1 


r (3) = 1,(3) = 0, r,(4) =17,(4)=0 and the correlation matrix becomes 


1.168 0.44 0.0484 0 0 
0.44 1.168 0.44 0.0484 0 
R =| 0.0484 0.44 1.168 0.44 0.0484 |. 
0 0.0484 0.44 1.168 0.44 
0 0 0.0484 0.44 1.168 


The maximum and minimum eigenvalues are: Anin = 0.7031, Ana, = 1.9869 
and, hence, the eigenvalue spread is XR) = A_| /A,,, = 4.3014. The value of 
the step-size parameter is found using (7.4.58): 


H= 1/GBetr{R, }) = 0.095. Part b) is found using similar steps. 


chapter 8 


Variations of LMS 
algorithms 


8.1 The sign algorithms 


This chapter covers the most popular modified LMS-type algorithms pro- 
posed by researchers over the past years, as well as some recent ones pro- 
posed by the authors. Most of these algorithms were designed on an ad hoc 
basis to improve convergence behavior, reduce computational requirements, 
and decrease the steady-state mean-square error. We start with this section 
by introducing the sign algorithms. 


The error sign algorithm 


The error sign algorithm is defined by 


w(nt+1)= w(n)+ 2 sign(e(n)) x(n) (8.1.1) 
where 
1 n>0 
sign(n)=4 0 n=O (8.1.2) 
-] n<0 


is the signum function. By introducing the signum function and setting p to 
a value of power of two, the hardware implementation is highly simplified 
(shift and add/subtract operation only). 


Book MATLAB function for sign algorithm 


function[w,y,e,J,wl]=aalmssign(x,dn,mu,M) 
$function[w,y,e,J,wlj=aalmssign(x,dn,mu,M); 
$all quantities are real-valued; 

$x=input data to the filter;dn=desired signal; 
%M=order of the filter; 


137 


138 Adaptive filtering primer with MATLAB 


$mu=step-size parameter;x and dn must be of the same length 
N=length (x) ; 
y=zeros(1,N); 
w=zeros(1,M);%initialized filter coefficient vector 
for n=M:N 
xl=x(n:-1:n-M4+1);%for each n the vector xl is produced 
%of length M with elements from x in reverse order; 
y (n)=w*x1'; 
e(n)=dn(n)-y(n); 
w=w+2*mu*sign(e(n))*x1; 
wl (n-M+1, :)=w(1,:); 
end; 
J=e.^2; 
the columns of wl depict the history of the filter 
coefficients; 


The normalized LMS sign algorithm 


The normalized LMS sign algorithm is (see below for normalized LMS 
algorithm) 


w(n-+1) = w(n) + 2p eM) (8.1.3) 
€+|x(n) 


Book MATLAB function for normalized LMS sign algorithm 


function[w,y,e,J,wl]=aanormlmssign(x,dn,mu,M, ep) 
6function[w,y,e,J,wl]=aalmssign(x,dn,mu,M,ep); 
all quantities are real-valued; 
$x=input data to the filter;dn=desired signal; 
$M=order of the filter; 
mu=step-size parameter;x and dn must be of the same length; 
$ep=sm 
N=length (x); 
y=zeros(1,N); 
w=zeros(1,M);%initialized filter coefficient vector 
for n=M:N 
xl=x(n:-1:n-M+1);%for each n the vector xl is produced 
of length M with elements from x in reverse order; 
y (n)=w*x1'; 
e(n)=dn (n) -y (n); 
w=w+2 *mu*sign (e(n})*x1/(ept+x1*x1'); 
wl (n-M4+1,:)=w(1,:); 
end; 
J=e.*2; 
$the columns of wl depict the history of the filter 
coefficients; 
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Signed-regressor algorithm 


The signed-regressor or data-sign algorithm is given as follows 
w(1 +1) = w(1) + 2ue(n) sign(x(n)) (8.1.4) 
where the sign function is applied to x(n) on element by element basis. 


Sign-sign algorithm 
The sign-sign algorithm is given by 


w(n +1) = w(t) + 2 sign(e(n)) sign(x(n)) (8.1.5) 


Book MATLAB function for sign-sign algorithm 
function[{w,y,e,J,wi]J=aalmssignsign(x,dn,mu,M) 
$function([w,y,e,J,wlj=aalmssignsign(x,dn,mu,M) 
wall quantities are real-valued; 
$x=input data to the filter;dn=desired signal; 
%M=order of the filter; 
mu=step-size parameter;x and dn must be of the same length 
N=lLength (x) ; 
y=zeros({1,N) ; 
w=zeros(1,M);%initialized filter coefficient vector 
for n=M:N 
xl=x(n:-1l:n-M+1);%for each n the vector xl is produced 
%gof length M with elements from x in reverse order; 
y (n)=w*x1'; 
e(n)=dn(n)-y(n); 
w=wt+2*mu*sign(e(n))*sign(x1); 
wl (n-M4+1,:)=w(1,:); 
end; 
J=e.^2; 
%gthe columns of wl depict the history of the filter 
%gcoefficients; 


8.2 Normalized LMS (NLMS) algorithm 
Consider the LMS recursion algorithm 


w(n+1)=w(n)+ 2u(n)e(n)x(n) (8.2.1) 
where the step-size parameter u(n) varies in time. We have observed in 


Chapter 7 that the stability, convergence, and steady-state behavior of the LMS 
algorithm, are influenced by the filter length and the power of the signal. 
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Therefore, we can set 


1 1 


ae E E 8.2. 
ax a 2i = 


u(n) 


in (8.2.1) to find the recursion 


w(n +1) = w(n)+ e(n)x(n) (8.2.3) 


1 
x" (1)x(11) 


The above equation can be found using a posteriori error (see Farhang- 
Boroujeny, 1999) minimization (see Problem 8.2.1) or using a constrained opti- 
mization procedure (see Haykin, 2001). Although (8.2.3) is appealing, in practice 
a more relaxed recursion is used that guarantees reliable results. Hence, we write 





w(n+1)= w(n)+ e(n)x(n) (8.2.4) 


m 
E+x'(n)x(n) 


where H and € are constants. The small constant € prevents division by a 


very small number of the data norm. 
The normalized LMS (NLMS) algorithm is shown in Table 8.2.1. 


Book MATLAB function for LMS algorithm with complex data 


function[w,y,e,J,wl]=aacomplexnims (x,dn,mubar,M,c) 
%function[w, y,e,J,wl]=aacomplexnims (x, dn,mubar ,M,c) 
%x=input data to the filter;dn=desired signal; 
%M=filter order;c=constant;mubar=step-size equivalent 
S$parameter; 
x and dn must be of the same length;J=learning curve; 
N=length (x) ; 
y=zeros(1,N); 
w=zeros(1,M);%initialized filter coefficient vector; 
for n=M:N 
xl=x(n:-1:n-M4+1);%for each n vector xl is of length 
$M with elements from x in reverse order; 
y (n)=conj (w) *x1'; 
e(n)=dn (n)-y(n) ; 


w=w+ (mubar/ (ct+tconj (x1) *x1'))*conj(e(n))*x1; 
wl (n-M+1, :)=w(1,:)}); 

end; 

J=e.^2; 


%ğthe columns of the matrix wl depict the history of the 
Sfilter coefficients; 
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Table 8.2.1 The NLMS Algorithm 


Real-valued functions Complex-valued functions 
Input: 
Initialization vector: w(n) = 0 
Input vector: x(n) 
Desired output: d(n) 
Step-size parameter: H 
Constant: € 
Filter length: M 
Output: 
Filter output: y(n) 
Coefficient vector: w(n + 1) 
Procedure: 
1) y(n) = w™(n)x = w(n)x" (n) 1) y(n) = w" (n)x(n) 
2) e(n) = d(n) - y(n) 2) e(n) =d(n) — w™(n)x(n) 
3) m 3) w(n+1)=w(n) 


w(n+1)= w(n)+ e(n)x(n) 


CORE 
€+x'(n)x(n) Be a x(n)e *(n) 


e+x"(1)x(1) 


Note: The superscript H stands for Hermitian, or equivalently conjugate transpose. 


8.3 Variable step-size LMS (VSLMS) algorithm 


The VSLMS algorithm was introduced in 1986 to facilitate the conflicting 
requirements, whereas a large step-size parameter is needed for fast conver- 
gence and a small step-size parameter is needed to reduce the misadjustment 
factor. When the adaptation begins and w() is far from its optimum value, the 
step-size parameter should be large in order for the convergence to be rapid. 
However, as the filter coefficients w(n) approaches the steady-state solution, 
the step-size parameter should decrease in order to reduce the excess MSE. 

To accomplish the variation of the step-size parameter, each filter coef- 
ficient is given a separate time-varying step-size parameter such that the 
LMS recursion algorithm takes the form 


w(n+1)=w,.(n)+2u,(ne(n)x(n-i) 1=0,1,...,.M-1 (8.3.1) 


where w;(n) is the i coefficient of w(n) at iteration n and p(n) is its associated 
step-size. The step-sizes are determined in an ad hoc manner, based on mon- 
itoring sign changes in the instantaneous gradient estimate e(1)x(m — i). It was 
argued, that successive changes in the sign of the gradient estimate, indicates 
that the algorithm is close to its optimal solution and, hence, the step-size 
value must be decreased. The reverse is also true. The decision of decreasing 
the value of the step-size by some factor c, is based on some number m; 
successive changes of e(1)x(n — i). Increasing the step-size parameter by some 
factor c,is based on m, successive sign changes. The parameters m, and m, can 
be adjusted to optimize performance, as can the factors c, and cz. 
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Table 8.3.1 The VSLMS Algorithm 


Input: 
Initial coefficient vector: w(0) 
Input data vector: x(n) = [x(n) x(n - 1) ... x(n- M +1) 
Gradient term: @,(n — 1) = e(n — 1)x(n — 1), 9,2 — 1) =e(n —-1)x(n - 1), ..., 
Built — 1) = e(n — 1)x(n — M) 
Step-size parameter: u(n — 1), p(n — 1), ..., Human — 1) 
a = small positive constant 
Umax = positive constant 
Outputs: 
Desired output: d(n) 
Filter output: y(n) 
Filter update: w(n + 1) 
Gradient term: gn), 2,(1), ..-, 8m7) 
Update step-size parameter: m(n), 14,(%), «.., Uy) 
Execution: 


1) y(n) = wi(n)x(n) 
2) e(n) = d(n) — y(n) 
3) Weights and step-size parameter adaptation: 
Fort] G22 = 
gi(n) = e(n)x(n — i) 
u(n) = p(n — 1) + assign(g,(1))sign(gi(n)) 
if m(n) > Umax MiC) = Umax 
if Hi) < Hmin Hi = Manin 
wn + 1) = wn) + 2u,(n)g (1) 
end 


The set of update Equations (8.3.1) may be written in the matrix form 
w(1+1) = w(n) + 2u(n)e(1)x(1) (8.3.2) 


where a(n) is a diagonal matrix with the following elements in its diago- 
nal: u(n) (n), +++ Uy, (7). The VSLMS algorithm is given in Table 8.3.1. 


8.4 The leaky LMS algorithm 


Let us assume that x(n) and d(n) are jointly wide-sense stationary processes 
that determine when the coefficients w(n) converge in the mean to w° = R Pa 
That is 


lim E{w()} = w° =R Pa (8.4.1) 
We start by taking the expectation of both sides of the LMS recursion as 
follows: 


E{w(n +1)} = E{w(n)} + 2uE{d(n)x(n)} — 2uE{x(2)x"(n)w(n)} (8.4.2) 
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where y(n) = x'(n)w(n). Assuming that x(n) and w(n) are statistically 
independent (independence assumption), (8.4.2) becomes 


E(w(n + 1)} = E{w(n)} + 2uE{d(n)x(n)} — 2uE{x(n)x"(n)JE{w(1)) 
= (I—2uR, )E{(w(1)} + 2up,, (1) 


which is similar to the steepest-descent method equation, see (6.2.3), with 
the difference that here we have ensemble average. This suggests that the 
steepest-descent method is applicable to ensemble average E{w(n + 1)}. 
Rewriting (6.2.8) in its matrix form we obtain 


8.4.3) 


En) =(I-2uA)"E(0) k=0,1,-,M—-1; n=1,2,3, (8.4.4) 


We observe that w(n) > w° if é&(n) = w(n)- w° > 0 as n> œ or when €'(1) = 
Q'é(n) converges to zero. The k row of (8.4.4) is 


E(n) =(1-2u,)"€.(0) (8.4.5) 
which indicates that ¢/(1) > 0 if 
[-2u4|<1 k=0,1,---,M-1 (8.4.6) 


OT 


O<u< LA (8.4.7) 
A, 


To be more restrictive, we must set the value of u to obey the inequality 


(8.4.8) 





1 

O<uU< is: 

If A, = 0, (8.4.5) indicates that no convergence takes place as n approaches 

infinity. Since it is possible for these undamped modes to become unstable, 

it is important for the stabilization of the LMS algorithm to force these modes 

to zero. One way to avoid this difficulty is to introduce a leakage coefficient 
into the LMS algorithm as follows: 


w(n+1)=(1— 2 y)w(n) + 2ue(n)x(n) (8.4.9) 


where 0 < y << 1. The effect of introducing the leakage coefficient y is to force 
any undamped modes to become zero and to force the filter coefficients to zero 
if either e(n) or x(n) is zero (the homogeneous equation w,(n + 1) = (1 — 2 uyw;(n) 
has the solution w,(7) = A(1 - 2 uy), where A is a constant). 

We write (8.4.9) in the form (e(n) = d(n) —x'(n)w(n)) 


w(1+4+ 1) = [I- 2u[x(n)x' (1) + yIJw(n) 4+ 2ud(n)x(n) (8.4.10) 


144 Adaptive filtering primer with MATLAB 


By taking the expected value of both sides of the above equation and using 
the independence assumption, we obtain 


E{tw(n+ 1)} =[—2u[R, + yHE{w(a)} + 2up,.(1) (8.4.11) 


Comparing the above equation with (8.4.3), we observe that the auto- 
correlation matrix R, of the LMS algorithm has been replaced with R, + y1. 
Since the eigenvalues of R, + yI are A, +y, and since A, 20, all the modes 
of the leaky LMS algorithm will be decayed to zero. Furthermore, the con- 
straint for the step-size parameter becomes 


1 
0 < u < ———— 8.4.12 
u hy ( ) 
As n> œ, w(n +1) = w(n) and, hence, (8.4.11) becomes 
lim E{w(n)} =[R, + Vp, (8.4.13) 


Ho 





which indicates that the leakage coefficient introduces a bias into steady-state 
solution Rp,- For another way to produce the leaky LMS algorithm see 
Problem 7.4.3. 


Book leaky LMS MATLAB function 


function[w,y,e,J,wl]=aaleakylms (x,dn,mu,gama,M) 
$function[w,y,e,J,wl]=aaleakylms (x,dn,mu, gama, M) 
$all signals are real valued;x=input to filter; 
$y=output from the filter;dn=desired signal; 
emu=step-size factor;gama=gamma factor<<1; 
$M=number of filter coefficients;wl=matrix whose M 
rows give the history of each filter coefficient; 
N=Length (x); 
y=zeros(1,N); 
w=zeros(1,M); 
for n=M:N 
xl=x(n:-1:n-M+1); 
y(n)=w*x1'; 


i 


e(n)=dn(n)-y(n); 
w= (1-2*mu* gama) *w+2*mu*e(n) *x1; 
wl (n-M+1,:)=w(1,:); 
end; 
J=e.^2; 


Figure 8.4.1 shows the input data {x(n)} to the leaky LMS filter, the output 
of the filter and the learning curve. The following signals and parameters 
were used: dn = desired signal = sin(0.17n), v = 2(rand-0.5), x = data = dn + v, 
u = 0.01, y =0., M = 16. 
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Figure 8.4.1 The input data, the output signal, and the learning curve of an adaptive 
filtering problem using leaky LMS algorithm. 


8.5  Linearly constrained LMS algorithm 


In all previous analyses of Wiener filtering problem, steepest-descent 
method, Newton’s method, and the LMS algorithm, no constrain was 
imposed on the solution of minimizing the MSE. However, in some appli- 
cations there might be some mandatory constraints that must be taken into 
consideration in solving optimization problems. For example, the problem 
of minimizing the average output power of a filter while the frequency 
response must remain constant at specific frequencies (Haykin, 2001). In this 
section, we discuss the filtering problem of minimizing the MSE subject to 
a general constraint. 
The error between the desired signal and the output of the filter is 


e(n) = d(n) — w'(n)x(n) (8.5.1) 
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We wish to minimize this error in the mean-square sense, subject to the 
constraint 


c'w=a (8.5.2) 


where a is a constant and c is a fixed vector. Using the Lagrange multiplier 
method, we write 


J. = Efe*(n)} + A(c'w-a) (8.5.3) 


where A is the Lagrange multiplier. Hence, the following relations 





VY Jag d ale 6 8.5.4 
wie =O an vin (8.5.4) 


must be satisfied simultaneously. The term oJ _/dA produces the constraint 
(8.5.2). Next we substitute the error e(n) in (8.5.3) to obtain (see Problem 8.5.1) 


J.=Jmin TER EtA E-a) (8.5.5) 
where 
Eln) = w(n)-w°, w° =R Pas R, = E{x(n)x"(n)} (8.5.6) 
and 
Pa = Eld(nx(n)}, a’ =a-c'"w" (8.5.7) 


The solution now has changed to V,J =0 and dJ/ðA =0 . Hence, from (8.5.5) we 
obtain l 


oJ, 
dÉ, pLa E T S a ae Ċi 
V.J. = | :3 +A}: J=0 (85.8) 
OS. pa iia 4 ee Pa ae 
dé, 


or in matrix form 


2R E +Ac=0 (8.5.9) 
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where & is the constraint optimum value of the vector €. In addition, the 
constraint gives the relation 


ae =c'&-a’=0 (8.5.10) 


Solving the system of the last two equations for A and ° we obtain 


, FP -l 
= 2a o AR € 
cIRic © eRe 





(8.5.11) 


Substituting the value of À in (8.5.5) we obtain the minimum value of J. to be 


g? 


c Rc 


Pae 


But w(n)= (n)+ w° and, hence, using (8.5.11) we obtain the equation 


‘Rt 
we =w +" (8.5.13) 
cR c 





Note: The second term of (8.5.12) is the excess MSE produced by the 
constraint. 


To obtain the recursion relation subject to constraint (8.5.2), we must 


proceed in two steps: 


Step 1: 
w'(n)=w(n)+ 2ueln)x(n) (8.5.14) 

Step 2: 
w(n+1)=w'(n)+n(n) (8.5.15) 


where n(n) is chosen so that c'w(n +1)= a while n'(n)n(n) is minimized. In 
other words, we choose the vector n(n) so that (8.5.2) holds after Step 2, 
while the perturbation introduced by n(n) is minimized. The problem can 
be solved using the Lagrange multiplier method that gives 


ga e (8.5.16) 


c'c 
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Thus, we obtain the final form of (8.5.15) to be 


a—c'w’(n) 
c'c 





w(n+1)=w (n)+ (8.5.17) 


The constraint algorithm is given in Table 8.5.1. 


Table 8.5.1 Linearly Constrained LMS Algorithm 


Input: Initial coefficient vector, w(0) =0 
Input data vector, x(n) 
Desired output, d(n) 
Constant vector, c 
Constraint constant, a 
Output: Filter output, y(n) 
Procedure: y(n) = w'(n)x(n) 
e(n) = d(n) — y(n) 
w’(n) = w(n)+ 2ue(n)x(n) 
a—e'w'(n) 


w(nt+1)= w’(n)+ = 
c'c 





Book constraint LMS MATLAB function 


function[w,e,y,J,w2]=aaconstrainedlms (x,dn,c,a,mu,M) 
$function[w,é,y,J,w2]=aaconstrainedims(x,dn,c,a,mu,M) ; 
$x=data vector;dn=desired vector of equal length with x; 
$c=constant row vector of length M;a=constant, e.g. 
$a=0.8;mu=step-~ 
$size parameter;M=filter order(number of filter 
S$coefficients) ; 
$w2=matrix whose columns give the history of each 
coefficient; 
w=zeros (1,M); 
N=length (x); 
for n=M:N; 

y (n)=w*x (n:-1:n-M+1)'; 

e(n)=dn(n)-y(n); 

wl=w+2*mu*e(n)*x(n:-1:n-M+1); 

w=wl+( (a-c*wl')*c/(c*c')); 

w2 (n-M+1,:)=w(1,:); 
end; 
J=e. ^2; 


Figure 8.5.1 shows the results of a constrained LMS filter with the following 
data: dn = sin(0.1nz); v = noise = 2(rand-0.5); x = data = dn + v; c= ones(1, 32); 
a = 0.8; u = 0.01; M = 32. 

As an example of solving a constrained optimization problem using 
Lagrange multiplier method, the NLMS recursion can be obtained as a 
solution of the following problem: 
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Figure 8.5.1 The input data, the output signal, and the learning curve of an adaptive 
filtering problem using constrained LMS algorithm. 


Minimize min Ilw(1)- w(n— 1)ll* subject to the constraint d(n) = w'(n +1) x(n) 
The first step in the solution is to write the cost function as: 


Kn) =| Awl? + Aldin) — w(n +1)x(n)] (8.5.18) 
where 


Aw = w(n+1)—- w(1) (8.5.19) 
Differentiating the cost function with respect to w(7 + 1) leads to 


alu = 2(w(1+1)— w(n))—-Ax(n) (8.5.20) 
dw(n +1) 
Setting this results to zero results in 


w(n+1)= w(t) +S Ax) (8.5.21) 
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Substituting this last result into the constraint d(n) = w'(n+1)x(n), we obtain 


d(n) = win) + ; Axim) x(n) 
(8.5.22) 


n 1 2 
= w'(n)x(n)+ 54 x(n) | 
Since e(n) = d(n) — w'(n)x(n), solving Equation (8.5.22) for A leads to 


yeaa (8.5.23) 


KOJN 





Substituting Equation (8.5.23) in (8.5.21) results in 


w(n+1)=w(n)+ — e(n)x(n) (8.5.24) 
[xo] 


Finally, introducing a factor u in Equation (8.5.24) to control the change in 
the weight vector, we obtain 





w(11 +1) = w(n) +—--— e(n) x(n) (8.5.25) 
[x)| 


Clearly, Equation (8.5.25) is the conventional NLMS algorithm. 


8.6 Self-correcting adaptive filtering (SCAF) 


One way by which we may improve the output of the adaptive filter so that 
it is approximately equal to the desired one is to use a self-correcting adaptive 
filtering as shown in Figure 8.6.1. In this proposed configuration the desired 





Figure 8.6.1 Block diagram of the SCAF method. 








Chapter 8: Variations of LMS algorithms 151 


signal is compared with signals that become closer and closer to the desired 
one. The output of the i stage is related to the previous one as follows: 


Ya) = Y, * Wi, (8.6.1) 


Book MATLAB function for self-correcting LMS algorithm 


function[w,y,e,J)])=aaselfcorrectinglms (x,dn,mu,M,TI) 

*function[w,y,e,J]aaselfcorrectinglms (x,dn,mu,M,I); 

l'w(1,:),y(1,:),e(1,:)]=aalms (x,dn,mu,M); 

for i=2:I%I=number of iterations, I<8-10 is sufficient; 
[wli,:),y(1,:),e(i,:) ]=aalms(y(i-1,:),dn,mu,M); 

end; 

J=e.^2; 


Figure 8.6.2a shows the input data into the self-correcting filter, Figure 8.6.2b 
shows the output, y,(1), of the first stage of the self-correcting filter, and 
Figure 8.6.2c shows the output, y,o(7), of the filter at its tenth stage. The data 
for these results were: dn = desired signal = sin(0.1nz); v = noise = randn; 
x = input data = dn + v; mu = 0.01; M = 10, I = number of stages = 10. 
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Figure 8.6.2 Noise reduction with a self-correcting filter. 
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Book MATLAB function for the self-correcting sign regressor 
function[w,y,e,J]=aaselfcorsignregreslms(x,dn,mu,M,T) 
$function[{w,y,¢e,J]=aaselfcorsignregreslms(x,dn,mu,M,I); 
$x=input data to the filter;dn=desired sig- 
nal;length(x)=length (dn); 

y=output of the filter an Ixlength(x) matrix;J=error 
function an 

SIxlength(x) matrix;I=number of stages; 


[w(1,:),y(1,:),e(1,:),J7(1,:) ]=aalmssignedregressor... 

(x,dn,mu,M); 

for i1i=2:I 
[wli,:),y(i,:),e(i,:),J(i,:)]=aalms(y(i-1,:),dn,mu,M)}); 

end; 

J=e.^2; 


Figure 8.6.3 shows the learning curves for the output of the first and 
fourth stages. It is apparent that the self-correcting adaptive filter improves 
with the number of stages used. The data were: dn = desired signal = 
sin{0.2n7), v = noise = 1.5(rand-0.5), x = dn + v, u = 0.01, M = 8, and I = 4. 
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Figure 8.6.3 Learning curves of the first and fourth stages of a self-correcting filter. 
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Book MATLAB function for self-correcting sign-sign algorithm 


function[w,y,e,J]=aaselfcorsignsignims (x,dn,mu,M, I) 
$function[w,y,e,J]=aaselfcorsignsignims(x,dn,mu,M,I); 
%x=input data to the filter;y=output data from the filter, 
$y is an Ixlength(x) matrix; J=learning curves, an Ix- 
$length(x) 

$matrix;mu=step-size parameter;M=umber of coefficients; I= 
$number of stages;w=an Ixlength(x) matrix of filter coef- 
%ficients; 

%$dn=desired signal; 


(w(1,:),y(1,:),e(1,:),d(1,:) ]=aalmssignsign(x,dn,mu,M); 
for i=2:I 

[w(i,:),yv(i,:),e(1,:),J0(1,:) ]=aalms(y(i-1,:),dn,mu,M); 
end; 
J=e.^2; 


The self-correcting adaptive filtering can easily be implemented by using 
all types of adaptive filters such as normalized, constrained, transform 
domain, etc. 


8.7 Transform domain adaptive LMS filtering 


The implementation of the LMS filter in the frequency domain can be accom- 
plished simply by taking the Discrete Fourier Transform (DFT) of both the 
input data, {x(1)}, and the desired signal, {d(n)}. The advantage of doing this 
is due to the fast processing of the signal using the Fast Fourier Transform 
(fft) algorithm. However, this procedure requires a block-processing strategy, 
which results in storing a number of incoming data in buffers, and thus, 
some delay is unavoidable. 

The simplest approach is that given by Dentino et al. (1978), and it is 
shown in Figure 8.7.1. The signals are processed by block-by-block format, 
that is {x(n)} and {d(m)} are sequenced into blocks of length M so that 

x(n)=x(iM+n) d(n)=diM+n) n=0,1,---,-M-1; 1=0,1,--- (8.7.1) 
The values of the 7 block of the signals {x;(n)} and {d,m)} are Fourier trans- 
formed using the DFT to find X,(k) and D,(k), respectively. Due to DFT 
properties, the sequences X,(k) and Dk) have M complex elements corre- 
sponding to frequency indices (‘bins’) k= 0, 1, ...,M -1 


M-1 .2mnk 
X.(k) = DET{x,(n)} = X xne m pe iM (8.7.2) 


n=0 





M-1 ,2nnk 
D.(k) = DET{d,(n)} = SY ane M  k=0,1,--,M-1 (8.7.3) 


n=0 
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Figure 8.7.1 Frequency domain adaptive filter (the circle indicates multiplication 
term by term and the subscript i indicates the i" block of data). 


During the i block processing, the output in each frequency bin of the 
adaptive filter is computed by 


Y,(k) = W, ,X,(k) k=0,1,2,--,M-1 (8.7.4) 


where W,, is the ktt frequency bin corresponding to the 7'* update (corre- 
sponding to the ït block data). The error in the frequency domain is 


E(k) = D.(k)—Y,k) k=0,1,2,--,M-1 (8.7.5) 
The system output is given by 


„2ank 


M-1 
y,(n) = y(iM +n) = IDFT(Y,(k)} = — Ye M n=0,1,2,--M-1 (8.7.6) 
k=0 





To update the filter coefficients we use, by analogy to LMS recursion, 
the following recursion: 


Wi, =W, + 2UE,° X (8.7.7) 
where 
Wa = Wao VV i Ea Worn lk 
W, = LW; 9 Wii = Woa 


E, =[E,(0) E() = E(M-1)]' 


X =[X'(0) XO = X(M-1)] 
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The dot (.) in (8.7.7) implies element-by-element multiplication and * 
stands for complex conjugate. If we set X; in the form 


X,(0) 0 e 0 
; O X) -> 0 
X, = diagtX,(0) X,(1) = X(M-1p=| | AMD : (8.7.8) 
O 0 X.(M —1) 
then (8.7.7) becomes 
Wia E W, T 2uX E, (8.7.9) 


Therefore, the Equations (8.7.1)-(8.7.7) constitute the frequency domain of 
the LMS algorithm. The Book MATLAB function that gives the coefficients 
after I blocks (or iterations) is given below. 


Book MATLAB Fourier transform LMS algorithm 
function[A]=aaftlms(x,d,M,1,mu) 
$function[A]=aaftlms(x,d,M,1I,mu); 
wk=zeros (1,M); 
for i=0:I $T=number of iterations (or blocks); 
if I*M>length(x)-1 
('error:I*M<length(x)-1') 


end; 
$M=number of filter coefficients; 
xl=x (M* (i+1):-1:i*M4+1); 
dil=d (M* (i+1) :-1:1*M4+1) ; 
xk=fft (x1); 
dk=fft (d1); 
yk=wk. *xk; 
ek=dk-yk; 


wk=wk+2*mu*ek.*conj (xk); 

A(1+1, :)=wk; 
end; 
%all the rows of A are the wk'sat an increase order 
%of iterations (blocks); 
%to filter the data, wk must be inverted in the time 
%gdomain, convolve with the data x and then plot the 
real part of the output y, e.g. wn4=the forth iteration 
S=i1fft(A(4,:)),yn4=filter(wn4/4,1,x) for even M; 


Example 8.7.1: The following Book MATLAB program produced the 
outputs that are presented in Figure 8.7.2. 


M=32; I=10; mu=0.01; 
n=0:999; 
d=sin(0.1*pi*n}; vel 5*randn (1.1000). 
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Figure 8.7.2 Adaptive filtering with FT LMS algorithm. 


x=d+v; 

[A]=aaftlms(x,d,M,I,mu); 

wn8=1fft(A(8,:));% the inverse fft of row 8 of A; 

yn8=filter(wn8,1,x); 

subplot (2,1,1);plot({x(1,1:200));xlabel({(`n’');ylabel... 
Cre (yy 

subplot (2,1,2);plot (real (yn8(1,1:200)));xlabel(`n’);... 
ylapel (Syne; 


Convergence 


Let the signals x(n) and y(n) be jointly stationary, and let the initial filter 
coefficient be zero, W, = 0. Substituting (8.6.4) and (8.6.5) in, we obtain (see 
Problem 8.7.1) 


2 + 
W, =(1-2u|X,| )W, , + 2uD,(k)X7(k) (8.7.10) 


The expected value of (8.7.10), assuming W,, and X;(k) are statistically inde- 
pendent, is given by 


E{W,,, .} = (1 2uE(|X,(K| DE(W, ,}+ 2uE(D (OX (O) (87-11) 
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Because x(n) and x(k) are stationary, their statistical characteristics do 
not change from block to block and, therefore, the ensembles E{|X,(k)I*} 
and E{D, (k)X; (k)} are independent of i but depend on k. Taking the Z- trans- 
form of (8. ie 11) with respect to i of the dependent variable W,,, we find the 
relation (see Table 2.3.1 and Problem 8.7.2) 


WG) , HEID (OX; W) 


8.7.12 
-1 z~1 ( ) 


W, (2) = -24E flx o J 


Applying the final value theorem (see Table 2.3.1 and Problem 8.7.2), we 
obtain the steady-state value for the filter coefficients 


W"} = E(DAK)X; (k)} (8.7.13) 





Let the mean filter coefficient error E(k) be defined by 
E(k) = E{W, ,J- E{W} (8.7.14) 


Then, using (8.7.13) and (8.7.11), we find (see Problem 8.7.3) 


PE (k) k=0,1,2,--,M-1 (8.7.15) 





E, (k)=(1-2uE{ 


Using the iteration approach for the solution of the above difference equa- 
tion, we find 


E (k)=(1- E(X, (Q WE (k) k=0,1,2,--,M-1 (8.7.16) 


The solution converges if 


1 : 2uE|X;(K) ai o ee o (8.7.17) 
EX Q) 


which shows that the power of the input plays a fundamental role in con- 
vergence and stability. 
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8.8 Error normalized LMS algorithms 


A new class of LMS algorithms based on error normalization has been reported 
by the authors in IEEE conferences in 2004 and 2005. These algorithms are: 


1. Error Normalized Step-Size (ENSS) LMS Algorithm 


w(n+1) = w(n) + > x(n) e(n) (8.8.1) 


l+u Je, (n) | 


2. Robust Variable Step-Size (RVSS) LMS Algorithm 


ule, (n)|" 


ae a a aA (8.8.2) 
ale +a -wx 


w(n+1)= w(n)+ 


3. Error-Data Normalized Step-Size (EDNSS) LMS Algorithm 





w(n+1) = w(n) + aot —x(n)e(n) (8.8.3) 
OL | e, (n)| + (1-a) | x(n) | 
where 
o2 la 
[e (n) | =Y \e(n-al (8.8.4) 
and 
Jen) | -5 e(n-i)| (8.8.5) 
i=0 
Comments 


e The parameters a, L, and uin all of these algorithms are appropriately 
chosen to achieve the best trade-off between rate of convergence and 
low final MSE. L could be constant or variable (L = n, for example), 
depending on whether the underlying environment is stationary or 
nonstationary. 

e The variable step-sizes in all of these algorithms should vary between 
two predetermined hard limits. The lower value guarantees the ca- 
pability of the algorithm to respond to an abrupt change that could 
happen at a very large value of iteration number n, while the maxi- 
mum value maintains stability of the algorithm. 
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Plant noise Plant output 


Plant input 
v{n) d(n) 


x(n) 






Adaptive 
filter 


Figure 8.8.1 Adaptive plant identification. 


Simulations 


The ENSS algorithm using an error vector of increasing length (L = n) is 
compared first with the NLMS algorithm in system identification shown in 
Figure 8.8.1. The adaptive filter and the unknown system are both excited 
by a zero-mean white Gaussian signal of unit variance. The length of the 
unknown system impulse response is assumed to be N = 4. The internal 
unknown system noise v(n) is assumed to be white Gaussian with mean = 0 
and variance 07, = 0.09. The optimum values of u in both algorithms are 
chosen to obtain the same exact value of misadjustment, M = 2%. The value 
of M is estimated by averaging excess mean-square error (EMSE) over n 
after the algorithm has reached steady-state, and dividing the result by 0%. 
Simulation plots are obtained by ensemble averaging of 200 independent 
simulation runs. Figure 8.8.2 shows the learning curves of the two algorithms. 
While retaining the same level of misadjustment, the ENSS algorithm 


MSE in dB 





0 200 400 600 800 1000 


Iteration number 


Figure 8.8.2 MSE learning curves of the ENSS and NLMS algorithms for white 
Gaussian input. 
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clearly provides faster speed of convergence than the NLMS algorithm. A MAT- 
LAB m file for the ENSS algorithm, which produces Figure 8.8.2, is shown below. 


Book MATLAB function of the ENSS algorithm 

ENSS Algorithm in System Identification (See Figure 
8.8.1) 

NN is the length of the error vector e, in the ENSS 
algorithm. 

mu: variable step-size of the ENSS algorithm. 





mul: the dimensionless step-size of the ENSS algorithm. 
N: length of the adaptive filter. 
I: number of independent simulation runs used in the 
plot of the learning curve. 
LL: total number of iterations. 
h: the impulse response of the unknown plant. 





x is the input signal to both the adaptive filter and 
the unknown system. 
dd is the output of the unknown system. 





n: the internal noise of the unknown plant. 
J is the MSE. 
clear all; 


X X X X XX XX BDO X de co CO KYKY 


randn ('state',0); 
I=200; LL=1000; J=zeros(1,LL); 
Jminn=zeros(1,LL);Jex=zeros(1,LL); 
N=4; NN=10*N; h=[1 0.7 0.5 -0.2]; 
for i=1:I 
y=zeros(1,LL); w=zeros({1,N); e=zeros(1,LL); 
X=zeros(N,1); D=zeros(NN,1); 
x=sqrt(1)*randn(1,LL) +s 
denn=0; n=sqrt(0.09)*(randn(1,LL)); 


for k=1:LL, 
dd=filter(h,1,x); 
X=[(x(k); X(1:N-1)]; den=X'*X; y=w*X; 


e(k)=dd(k)+n(k)-y ; 

mul=0.8; denn=denn+e (k)^2;mu=(mu1/ (1+mul*denn)); 

w=wtmu*e(k) *X'; 

J(k)=J3 (k)+(abs (e(k)))%2; 
Jminn (k)=Jminn(k)+n(k)%*2; 
end; 

end; 
J=J/I; JIminl=Jminn/ l-JImin=sum(Iminl)/LL: Jex=J-JImin; 
Jinf=(1/200)*sum(J(LL-199:LL}));JSSdB=10*l0og10(Jinf); 
Jexinf=abs (Jinf-Jmin) ;JexinfSSdB=10*1log10(Jexinf) ; 
MM=Jexinf/Jmin; Mpercent=MM*100; 
[mul, Jinf, Jmin, JSSdB, Jexin£SSdB,Mpercent ] 
nn=0:LL-1;plot(nn,10*log10(abs((J)))); 
hold onep lot (nn,10*l0og10 (Jmin}); 
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Figure 8.8.3 MSE learning curves of the RVSS and NLMS algorithms for an abrupt 
change in the plant parameters. 


A comparison of the RVSS algorithm with the NLMS algorithms is dem- 
onstrated next for white Gaussian input in system identification setup. It is 
assumed that @ = 0.5, and L = 10N in the RVSS algorithm. The length of the 
adaptive filter is assumed to be N = 10. The internal unknown system noise 
is white Gaussian with mean = 0 and variance O f equals to 0.01. Simulation 
plots are obtained by ensemble averaging of 200 independent simulation runs. 
Figure 8.8.3 shows the learning curve of both algorithms for the case with an 
abrupt change in the impulse response of the plant, h. In particular, it is 
assumed that all the elements of h are multiplied by (-1) at iteration number 
1500. Figure 8.8.4 shows the plot of the ensemble average trajectories of the 
fifth coefficient of the adaptive filter. The actual value of the corresponding 
unknown system coefficient to be identified is 0.5. The superiority of the RVSS 
algorithm is evident. A MATLAB m file for the RVSS algorithm with this 
assumed abrupt change in the plant parameters is shown below. 


Pook MATLAB function of the RVSS algorithm 
RVSS Algorithm in System Identification (See Figure 
S aBa L) 
NN is the length of the error vector e, in the RVSS 
algorithm 
mu: variable step-size of the RVSS algorithm. 
mul: the dimensionless step-size of the RVSS algorithm. 
a= œ in the algorithm. 
N: length of the adaptive filter. 
I: number of independent simulation runs used in the 


cw X X XX X CO X Æ 


o o 


plot of the learning curve. 
LL: total number of iterations. 


o o 


h: the impulse response of the unknown plant. 
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E(w5) 





“0 500 1000 1500 2000 2500 3000 


Iteration number 


Figure 8.8.4 Ensemble averages of the 5th coefficient of the adaptive filter in the RVSS 
and NLMS algorithms for the case with an abrupt change in the plant parameters. 


oe 


x 1s the input signal to both the adaptive filter and 
the unknown system. 
dd is the output of the unknown system. 


cf oe oe 


n: the internal noise of the unknown plant. 
J is the MSE and Jex is the excess MSE. 
Mpercent is the misadjustment percentage. 
JSSDB is the steady state MSE in dB. 
clear all; 
randn('state',0O); 

T=200; LL=3000; J=zeros(1,LhL); 
Jminn=zeros(1,LL) ;Jex=zeros(1,LL); 
N=10; NN=10*N; 
for I=1-T 
hed Ou 04.35 Od Oe OA 03 Rae Deka iS 
y=zeros(1,LL); w=zeros(1,N); e=zeros(1,LL); 
X=zeros(N,1); D=zeros(NN,1); 
x=sqrt (1) *randn (1 ,LGL)3 
denn=0; n=sqrt(0.01)*(randn(1,LL)); 
for k=1:LL, 

dd=filter(h,1,x); 

X=[x(k); X(1:N-1)]; den=X'*X; y=w*X; 

e(k)=dd(k)4+n(k)-y ; 

IE kKe=15002 
he= (Ow 02 O43: Oe 0D 


o2 


oe o 


Ook Os! Oe uae 
end; 
mul=0.07; D=[e(k); D(1:NN-1)];denx=(D'*D); a=0.5; 
denn=dennt+e(k)*2; mu=(mul*denx)/... 
((a*denn+({1-a) *den)); 
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w=wtmu*e(k) *xX'; 

J (k)=J (k)+(abs (e(k)))%*2; 

Jminn(k)=Jminn(k)+n(k)%*2; 

end; 

end; 
J=J/ I; IJminl=Jminn/T;Jmin=sum(Jmin1)/LL; Jex=J-Jmin; 
Jinf=(1/200) *sum(J (LL-199:LL) ) ;JSSdB=10*10g10 (Jinf) ; 
Jexinf=abs (Jinf-Jmin) ; JexinfSSdB=10*10g10 (Jexinf) ; 
MM=Jexinf/Jmin; Mpercent=MM* 100; 
[mul, Jinf, Jmin, JSSdB, JexinfSSdB,Mpercent ] 
nn=0:LL-1;plot(nn,10*logl10(abs({((J)))); 
hold on; plettnn,10* 1601.0 Gimin):)4 


Finally, the performance of the EDNSS algorithm is compared with the 
NLMS algorithm in an adaptive noise canceller shown in Figure 8.8.5. The 
simulations are carried out using a male native speech saying “sound editing 
just gets easier and easier” sampled at a frequency of 11.025 kHz. The number 
of bits per sample is 8, and the total number of samples is 33000 or 3 sec of real 
time. The same value of step-size (= 0.1) was used in both algorithms to achieve 
a compromise between small EMSE and high initial rate of convergence for a 
wide range of noise variances. In the EDNSS algorithm, we used @ = 0.7 and 
L=20N. The order of the adaptive filter was assumed to be N = 10. Figure 8.8.6, 
from top to bottom, shows the original clean speech, corrupting noise 
with op = 0.01, speech corrupted by noise, and the recovered speech after noise 
cancellation using EDNSS algorithm. Listening tests show that the recovered 
speech is of a high quality and is very close to the original speech. Figure 8.8.7 
compares the performance of the EDNSS algorithm with that of the NLMS for 
the case when o7 = 0.01. The figure shows plots of the EMSE in dB for that 
noise level of the two algorithms. While both algorithms have almost the same 
initial rate of convergence, the average EMSE in EDNSS is less than that of the 
NLMS by 10.6 dB. The values of EMSE were measured in both algorithms over 
all samples starting from sample number 2000, where the transient response has 
approximately ended. A MATLAB m file for the EDNSS algorithm in adaptive 
noise canceller for the above mentioned values of parameters is shown below. 





Figure 8.8.5 Adaptive noise canceller. 
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v,(n) 





S(n) + v,(n) 





0 0.5 1 1.5 2 2.5 3 x 104 


e(n) 





0 0.5 1 1.5 2 2.5 3 x 104 


Iteration (sample) number 


Figure 8.8.6 From top to bottom: Original clean speech S(n), noise that corrupts 
speech v,(n), corrupted speech S(n) + v,(n), recovered speech e(n) using EDNSS 
algorithm (0; = 0.01). 


EDNSS Algorithm in Adaptive Noise Canceller (see Figure 
8.8.5). 

S: Speech signal vector and M is its length. 

LL=total number of iterations =M= total number of speech 


dP dP of 


oe 


% samples. 

% N: length of the adaptive filter. 

% fs: sampling frequency 

% nbits: number of bits per second 

% NN is the length of the error vector e, in the EDNSS 
% algorithm. 

% hl is the impulse responses of the autoregressive (AR) 
% filter between the reference input and the primary 

% microphone 


h2 is the impulse responses of the AR filter between 


ce g&æ 


the reference input and the adaptive filter w. 


de 


JZ is the excess MSE smoothly averaged over N1 samples. 


oe 


Mu: Dimensionless step-size of the EDNSS algorithm. 
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1 EDNSS EMSEss = —41.2 dB 
2 NLMS . EMSEss = -30.6 dB 


Be, 


= 
2 


EMSE in dB 





0 0.5 1 1.5 2 2.5 3% 104 


Iteration number 


Figure 8.8.7 Excess mean-square error of the EDNSS and NLMS algorithms 
(o? =0.01). 


alpha: a parameter used in the EDNSS algorithm. 


ce cP 


el is the excess error. 
[S,fs,nbits]=wavread('C:\sound_editing'); 
S=S(9500:42500) ; 

S=S'; M=length(S); ri=(S*S')/M; 
randn('state',0O);rand({'state',0); 

hi=[1 -0.3 -.1]; h2=[1 -0.2]; 

N=10; NN=20*N; N1=200; LL=M; 
J=zeros(1,LL); Jl=zeros(1,LL); JZ=zeros(1,LL); 
g=sqrt (0.01) *randn(i,M); 

vl=filter(1,h1,g}; v2=filter(1,h2,g); 


e=zeros(1,LL); y=zeros(1,LL) ;w=zeros(1,N); 
V=zeros(N,1);El=zeros(N1,1);D=zeros(NN,1); 
d=S+v1; 


for k=1:LL, 
V=[v2 (k); V(1:N-1)]; den=V'*V; y(k)=w*V; 
e(k)=d(k) -y (k) ; e1 (k) =e (k) -S(k); 
alphal=0.7; Mu=0.1; D=[e(k); D(1:NN-1) ];denx=(D'*D); 
w=w+ (Mu/ (alphal*denx+(l-alphal) *den))*e(k)*V'; 
El=[e1 (k); E1(1:N1-1)]; JZ(k)=J3Z(k)+((EB1L'*E1)/N1); 
end 
JZ=JZ/1I; F=2000; 
JJZex=(1/ (LL-F) ) *sum(JZ(F:LL-1)) ;J7ZexdB=10* log10 (JJZex) ; 
rli=(1/(LL-F})*S((F:LL-1)}*S((F:LL-1})'; 
MM3=JJZex/r1l1; 
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[Mu, MM3 *100, JJZexdB] 
J11=J1(1:LL});nn=0:LL-1; 


plot (nn, 10*lo0g10 (abs ((JZz}))})); 
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Table 8.8.1 shows a summary of the LMS-type algorithms presented in 


this chapter. 


Table 8.8.1 Summary of the LMS Algorithms Presented in Chapter 8 


x(n) =[x(1)x(n—1) --- x(n- M)] ,w(n) = [w (1) w, (n) = wy)’, e(n) = d(n) - y(n) 


Algorithm 
1. LMS 
2. LMS with complex data 


3. Sign LMS 

4. Sign-regressor LMS 
5. Sign-sign LMS 

6. Normalized LM 


9a. e-Normalized LMS 


9b. e-Normalized LMS 
with complex data 


10. Normalized LMS sign algorithm 


11. Leaky LMS 


12. Constrained LMS 


13. Self-correcting LMS 


14. Transform domain LMS 

15. Self-correcting adaptive 
filtering (SCAF) 

16. ENSS Algorithm 

17. RVSS Algorithm 

18. EDNSS Algorithm 


Recursion 
w(n+1)= w(n) + 2pe(n)x(n) 
w(n+ 1) = w(n) + 2pe *(n)x(n) 
y(n) = w" (n)x(n)(H = conjugate transpose) 
w(n+1) = w(n) + 2usign(e(n))x(1) 
w(n+1)= w(n) + 2ue(n)sign(x(n)) 
w(n +1) = w(n)+ 2ysign(e(n))sign(x(1)) 

1 
w(n+1)= E 
with u(n) = 12x" (n)x(1)] 
w(n+1)= w(n)+ 


7 | +x" ()x(n) 
H = step-size parameter 


e(n)x(n) 


e(n)x(n) 


€ = prevents division by very small number 





w(n+1)= w(n)+——+——e*(n)x(n) 
E€+x (n)x(n) 
H = conjugate transpose 
w(n+1)= avon ee 
e+|x(n)| 


w(n+1)=(1—2py)w(n)+ 2pe(n)x(n) 
0 << <1 
w(n+1)= ao e no, 


c'c 


w'(n)= w(n)+ 2pe(n)x(n) 

c = constant vector, a = constant 
Y(t) = yin) Wia 

see also the m-file in the text 
see Sec. 8.7 

see Sec. 8.6 


see Sec. 8.8 
see Sec. 8.8 
see Sec. 8.8 











Chapter 8: Variations of LMS algorithms 167 


Problems 
8.1.1 Show that the step-size of a sign algorithm is equivalent to 


u’(n) = U/l e(n) |. Discuss the value of u’(n) for stability. 


8.2.1 Minimize the posteriori error e, (7) = d(n)-— w'(n +1)x(n) to obtain the 
. 5 ps 
normalized LMS algorithm. 


8.4.1 To find the leaky LMS algorithm, apply the LMS gradient descent 
algorithm to minimize the error J(n) = e*(n)+ yw" (n)w(n). 


8.5.1 Verify (8.5.5). 
8.7.1 Verify (8.7.10). 
8.7.2 Verify (8.7.12). 


8.7.3 Verify (8.7.15). 


Hints-solutions-suggestions 


8.1.1: 

We write (8.1.1) in the form w(n +1) = w(n)+ 2u(e(n)/ e(n))x(n). We observe that 
u(n) = 2H oy increases as le(n) decreases. Therefore, y must be very small 
for the logarithm to converge. If we choose very small u, we automatically 
choose very small y’(n), and thus, the convergence at the beginning is slow. 
But as y(n) increases the convergence becomes faster. 


8.2.1: 
Substituting the LMS recursion equation in e(n) we obtain e (n) = 


[1 — 2u(n)x" (n)x(n)le(n). 





Fe) =x" (n)x(n)—2x" T ET OTT or u(t) 
Hence, gu(n) 
= 1/[2x" (n)x(n)]. 
OT 
8.4.1: 


Jn) = [d(n) — w'(n)x(n)P + yw" (2) w(n) = a(n) — 2d(n)w" (1) x(n) 


+ w' (n)x(n)w! (11)x(n) + yw'(n)w(n) (1) 
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(Note that [w!(n)x(n)? = w!(n)x(n)[w!(n)x(n)] since w'x(1) is a number.) The ith 
component of (1) is J(n) = d*(n)- 2d(n)w,(n)x(n) + [w (nx; (mF + yw? (n) (2). 
Therefore, dJ(n/ow, = 0 = -2d(n)x,(n)+ 2w,(n)x,(n) + 2yw,(n) (3). In matrix 
form it becomes VJ = —e(n)x(n) + y w (n) (4). Introducing (4) into the recursion 
equation of the gradient descent algorithm, we obtain 


w(n+1)= w(n)- u VJ = w(n)+ ue(n)x(n)- u yw(n) = (1— u y)w(n) + ue(n)x(n). 
8.5.1: 
J = Ef[d(n)- w" (n)x(n)][d(n) - x" (n)w(n)]+ Ale" w(n) - a)} 
= E{d?(n)—d(n)w' (n)x(n) — d(n)x" (n)w(n)+ w (n)x(n)x! (n)w(n) 
+ Ac’ w(n)— Aa} 
= o? - w? (n)E{d(n)x(n)} — wE{d(n)x(n)} + wt (n)E{x(n)x? (n)w(n) 
+ Ae’ w(n)—Aa 
—2w'(n)p, +w" (n)R,w(1)} + Ac! w(1) - Aa 


0 


; 
*—2w"(n)p, +(w(n)—w?)' R, (w(n)— w’) + wR, w(n) + w' (n)R, w 
+ W'R w° + Ale™(w(n)— w")-(a—e"w")] 
=o, WPa tE REHA E-a) 
where RW =P (wR w(n))' = w"(n)R w° (R, = symmetric). 


8.7.1: 
The k value is 


Wai. =W, + 2uX; E, = W, ,+24X;[D;(k) - Y,(K)] = W, ,+ 2UX; (WID, (k) 
- WX; (K)] 
=W, , + 2X; (k)D,(k) — 24W, , | X,(k) P= (1-241 X; (k) PW, , 
+ 2uD,(k)X; (k) 
8.7.2: 


The z-transform and the ensemble are linear operations and can be inter- 
changed. Therefore, the other z-transform is 


2W,(z) - zW, = (1- 2WE(| X,(k) PW, (2) + [2HEID, (K)X} (VA 27) 0, 
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where W, , =0 since_it was assumed that the initial conditions have zero 
values, and W,(z)= 4F{W, ,}z'. Multiplying (1) by z7! and (z — 1), and apply- 
ing the final value theorem (see Table 2.3.1), we obtain 


2HE{D,(k)X;(k)}  _ EID; (X; (k) 


7 (2) 


E{W,” } = lim{z — 1)W, (z)} = lim 2 a 
ar 1 1-(1-24ElX (o z EX) 





8.7.3: 


X (O DEW, ,} + 2UE(D (KX? (k) 





Ea (k) = E{W, i} co E{(We} Sil 2 Et 


E{D.(k)X: (k 2 2 E{D.(k)X"(k 
a = (1-2 (|X,(k), DEW, }- (1 - 2HE (|X, (k) ——— > ai 


Elx (o) Ef 








= (1- 2uE(|X,() DE, 





chapter 9 


Least squares and recursive 
least-squares signal processing 


9.1 Introduction to least squares 


The Wiener and adaptive filters belong to the statistical framework since the 
signal statistics are being invoked and it is required that a priori knowledge 
exists of the second-order moments. On the other hand, the method of least 
squares belongs to the deterministic frame. In addition, this method requires 
that both the input signal and the desired one be measured. There are several 
important cases that such restrictions of signal measurements can be applied, 
such as modeling applications, linear predictive coding, and communica- 
tions, where the desired signal is taken to be the training set. 


9.2 Least-square formulation 


We consider a linear adaptive filter with coefficients at time n 
w(n) =[w,(n) w,(n)---w,,()]', a measured real-valued input vector 


x(n) =[x,(1) x, (n) --- x,, (n)]', and a measured desired response d(n). 


Note that no structure has been specified for the input vector x(n), and 
therefore, it can be considered as the successive samples of a particular 
process or as a snapshot of M detectors as shown in Figure 9.2.1. Hence, the 
problem is to estimate the desired response d(n) using the linear combination 


M 
y(n) = w" (n)x(n) = Vw, (nx (n) n=1,2,--,N (9.2.1) 
k=1 
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(a) 


x(1) x(2) x(3) ... x(M) x(M+1) x(M+2)... x(N-1) x(N) 


Ce he a 
x(1) 


x(3) 


Figure 9.2.1 (a) Multisensor application. (b) Single sensor application. 





The above equation can be represented by a linear combiner as shown in 
Figure 9.2.2. The estimation error is defined by the relation 


e(n) = d(n)— y(n) = d(n)— w! (n)x(n) (9.2.2) 


The coefficients of the adaptive filter are found by minimizing the sum of 
the squares of the error (least squares) 


J=E= 5 e(nje?(n) (9.2.3) 





y(n) 


wg) 


X3(11) 


Figure 9.2.2 Linear estimator (M-parameter system). 
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where g(n) is a Weighting function. Therefore, in the method of least squares 
the filter coefficients are optimized by using all the observations from the 
time the filter begins until the present time and minimizing the sum of 
the squared values of the error samples that are equal to the measured 
desired signal and the output signal of the filter. The minimization is valid 
when the filter coefficient vector w() is kept constant, w, over the mea- 
surement time interval 1 <n < N. In statistics, the least-squares estimation 
is known as regression, e(n) are known as signals, and w is the regression 
vector. 
We next define the matrix of the observed input samples as 


: — data records (M xN) 
; : snapshots 


(9.2.4) 


where we assume that N > M. This defines an over-determined least-squares 
problem. 

For the case in which we have one dimensional input signal, as shown 
in Figure 9.2.1b, the data matrix takes the form 


x(M) x(M +1) -= xN) 


x(M-1) xM)» x(N —1) 
X” =|x(M-2) x(M-—1) =- x(N-2) (9.2.5) 
x(1) x(2) + x(N-M+1) 


The output y, the error e, and the data vectors x, are: 


y = Xw (9.2.6) 


e=d-y (9.2.7) 
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where 


y=[yQ) y(2) --- y(N)]' = filter output vector (N x1) (9.2.8) 
d=[d(1) d(2) --- d(N)]' (9.2.9) 

e=[e(1) e(2) --- e(N)]' =error vector (N x1) (9.2.10) 

x=[x (n) x(n) == x, (|! =snapshot (N x1) (9.2.11) 


x, =[x,(1) x,(2) =- x (N)]' =data vector, k=1,2,-,M (9.2.12) 
w =[w w, wl = filter coefficients, (M x1) (9.2.13) 
X=[x X eX] =(N xM) (9.2.14) 
In addition, with ¢(n) = 1 for all n, (9.2.3) takes the form 


J =e'e=(d—y)' (d-y)=(d-Xw)' (d-Xw) 
=d'd-—w'X'd-d'Xw+w'X'Xw (9.2.15) 


=E,-w'p-p'wtw Rw=E,-2p'w+w'Rw 


where 
E,=d'd= Y andon (9.2.16) 
n=1 
N 
R=X'X= % x(n)" (n) (MxM) (9.2.17) 
p=X'd= > x(njd(n) (Mx1) (9.2.18) 
y=Xw= S wx, (N x1) (9.2.19) 
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The matrix R becomes time averaged if it is divided by N. In statistics, the 
scaled form of R is known as the sample correlation matrix. 

Setting the gradient of J with respect to the vector coefficients w equal 
to zero, we obtain (see Problem 9.2.1) 


Rw=p; p'=w' R'=w'R (R issymmetric) (9.2.20) 
or 
w=R"p (9.2.21) 
Therefore, the minimum sum of squared errors is given by 
J „œa = a'd-2p'R'pt+tw'RR'p=E,-p'R'p=E£,-pw (9.2.22) 


since R is symmetric. For the solution given in (9.2.21) see Problem 9.2.3 and 
Problem 9.2.4. 


Example 9.2.1: Let the desired response be d = [1 1 1 1], and the two 
measured signals be x, = [0.7 1.4 0.4 1.3]!, x, = [1.2 0.6 0.5 1.1]'. Then we 


obtain 


0.7 1.2 
0.7 14 04 1311.4 0.6 4.30 3.31 
R=xX!'X= = 
1.2 06 05 11104 05 3.31 3.26 
1.3 1.1 


3.8] _ 0.3704 
p=X'd= P s w=R'p= ere P J nin = 0.3252 


y =X w =[1.0595 0.9187 0.4816 1.2150] 


The least-squares technique is a mathematical procedure that enables us 
to achieve a best fit of a model to experimental data. In the sense of the 
M-parameter linear system, shown in Figure 9.2.3, (9.2.1) is written in the 
form 


y(n)=w x (n)+w,x (n)+ = +w x(n) n=1,2,--,N (9.2.23) 
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X 
i System parameters 






X2 






Wp Wo ety WM 
XM 


Figure 9.2.3 An M-parameter linear system. 
The above equation takes the following matrix form 
y=Xw (9.2.24) 


To estimate the M parameters w, it is necessary that N 2 M. If N= M, 
then we can uniquely solve for w to find 


w=X'y (9.2.25) 


provided X7 exists. w is the estimate of w. Using the least-error-squares we 
can determine w provided that N > M. 
Let us define an error vector e=[e, ¢,---e,,]' as follows: 


e=y-Xw (9.2.26) 


Next we choose w in such a way that the criterion 


N 
J= e? =ele (9.2.27) 
~ 
is minimized. To proceed we write 


J =(y-Xw)'(y— Xw) 


=y'y-w X'y-yXw+w 'X'Xw 


(9.2.28) 


Differentiating J with respect to w and equating the result to zero for deter- 
mining the conditions on the estimate w that minimizes J. Hence, 


an = -2X'y +2X'XWw = 0 (9.2.29) 


X'Xw = X'y (9.2.30) 
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from which we obtain 


w =(X1X) X'y (9.2.31) 


The above is known as the least-squares estimator (LSE) of w. (9.2.30) is known 
as the normal equations. 

If we weight differently each error term, then the weighted error criterion 
becomes 


J, =e'Ge =(y— Xw)'G(y- Xw) (9.2.32) 


The weighting matrix G is restricted to be symmetric positive definite 
matrix. Minimizing Jg with respect to w results in the following weighted 
least-squares estimator (WLSE) w: 


w =(X'GX)'X'Gy (9.2.33) 


If G = I then w=wG. 


Statistical properties of least-squares estimators 
We rewrite (9.2.26) in the form (X = deterministic matrix) 
y=Xw+e (9.2.34) 


and assume that e is a stationary random vector with zero mean value, 
E[e] = 0. Furthermore, e is assumed to be uncorrelated with y and X. There- 
fore, on the given statistical properties of e, we wish to know just how good, 
or how accurate, the estimates of the parameters are. 

Substituting (9.2.34) in (9.2.31) and taking the ensemble average we 
obtain 


E{w} = Elw + (XTX) Xe} = Efļw}+ E{X'X)'X}E{e} 
(9.2.35) 
=w (Eļe}=0) 


which indicates that w is unbiased. 
The covariance matrix corresponding to the estimate error W-w is 


C, Ew- Ww- w)"} EOX X y -ww — w)")} 
= E{[X™X)1X7(Xw +e)- w](w-w)7} 
= E{[(X™X)*(X7X)w +(X7X) te- w]j[ŵw - w] } 
(9.2.36) 
= E{{(X*X)*X7e][(X7X)*X7e]"} 
=(X™X)'X"E{ee"}X(X'X)7 


=(X'Xx)? X™R_X(X™X) | (R, is the error correlation marix) 
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If the noise samples e(7) for i= 1, 2, 3... are normal, identical distributed 
with zero mean and variance o° ((e = N(0,o1)), then 


R =E{ee'}=o0°I (9.2.37) 
and, hence, 
C= 07 (X'X)" (9.2.38) 


Using (9.2.34), and taking into consideration that e is a Gaussian random 
vector, then the natural logarithm of its probability density is given by 


In p(e; w) = In 





1 1 1 TAn, 
VE c eo -59-%0 City xw) 
à (9.2.39) 


1 
= —]n(2z o’)? — ano — Xw)! (y - Xw) 


since C= oI and IC_| implies the determinant of C,. Next, we differentiate 
(9.2.39) with respect to the parameter w. Hence, we find 


ome; w) sz ae a [y’y -2y"Xw+w'X'Xw] (9.2.40) 


since y'Xw = w'X'y = scalar. Using the identities below (see Appendix A) 











T T 
a =b, wAn =2Aw (Ais symmetric) (9.2.41) 
(9.2.40) becomes 
olnp(e; w) 1 
-~ zalX'y ~X'Xw] (9.2.42) 


Assuming that X'X is invertible, then 


dln ple; p) _ xX 


Ow gr WCE XY -w]=I(w)[g(w)-w] (9.2.43) 
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From the Crame—Rao lower bound (CRLB) theorem, w is the minimum 
variance unbiased (MVU) estimator, since we have found that 


w =(X'X)'X'y=g(w) (9.2.44) 


and (9.2.43) takes the form 


, T 
one) = TIo S (9.2.45) 


The matrix 


I(w)=X'X/o° (9.2.46) 


is known as the Fisher information matrix. The Fisher matrix is defined by the 
relation 





[I(w)]; =E | 4 ne es a (9.2.47) 
e] 


in the CRLB theorem, and thus, the parameters are shown explicitly. Com- 
paring (9.2.38) and (9.2.46), the MVU estimator of w is given by (9.2.44) and 
its covariance matrix is 


C =I" (w) = 0° X"X)" (9.2.48) 


The MVU estimator of the linear model (9.2.34) is efficient since it attains the 
CRLB or, in other words, the covariance matrix is equal to the inverse of the 
Fisher information matrix. 

Let us rewrite the error covariance matrix in the form 


2 -1 
C -root Ë| Lxx) (9.2.49) 
w N\N 


where N is the number of equations in the vector equation (9.2.34). Let 
lim [(1/N )X'*X}" =A where A is a rectangular constant matrix. Then 


lim C_ = lim 


Noo 


= 0 (9.2.50) 
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Since the covariance is zero as N goes to infinity implies that w = w. The 
above convergence property defines w as a consistent estimator. 

The above development shows if a system is modeled as linear in the 
presence of white Gaussian noise, the LSE approach provides estimators that 
are unbiased and consistent. 


9.3 Least-squares approach 


Using the least-squares (LS) approach we try to minimize the squared dif- 
ference between the given data (or desired data) d(n) and the output signal 
of a LTI system. The signal y(n) is generated by some system, which in turn 
depends upon its unknown parameters w,’s. The LSE of w/s chooses the 
values that make y’s closest to the given data. The measure of closeness is 
defined by the LSE (see also (9.2.15)). For the one-coefficient system model, 
we have 


Jw) = X (a(n) - yor? (9.3.1) 


and the dependence of J on w is via y(n). The value of w that minimizes the 
cost function J(w) is the LSE. It is apparent that the performance of LSE will 
depend upon the statistical properties of the corrupting noise to the signal 
as well as any system modeling error. 


Example 9.3.1: Let us assume that the signal is y(n) = acos(,n), where a, 


is known and the amplitude a must be determined. Hence, the LSE mini- 
mizes the cost function 


}(a) = Yan) -acoso n)” (9.3.2) 


n=l 


Therefore, we obtain 


ta = pS (—)2 cos œo n(d(n) ~—acos@ 1) =0 
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Let us assume that the output of a system is linear, and it is given by 
the relation y(n) = x(n)w, where x(n) is a known sequence. Hence, the LSE 
criterion becomes 


joo) = J (a(n) — xn (9.3.3) 
n=1 
The estimate value of w is 
N 
X dinan) 
w= (9.3.4) 


and the minimum LS error is given by (Problem 9.3.2) 


$no) 


Cs Seo i DY donzen : G z È 
7 = - > x(n) 


n=1 





(9.3.5) 


Example 9.3.2: Consider the experimental data shown in Figure 9.3.1. It 
is recommended that a linear model, y(n) = a + bn, for the data be used. 
Using the LSE approach, we find the cost function 


N 
J(w)= SY (ar) -a-bn? =(d- Xw)"(d—Xw) (9.3.6) 
where 

11 

H T2 
We! 17 X= (93:7) 

b a 
1 N 


From (9.2.31), the estimate value of w is 


w =(X'X)'X"d (9.3.8) 
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0 10 20 30 40 50 60 70 80 90 100 
Figure 9.3.1 Illustration of Example 9.3.2. 


and from the data shown in Figure 9.3.1 


~ |a] | 1.6147 
W=!t,./= 
b| | 0.0337 
The straight line was also plotted to verify the procedure of LSE. The data 
were produced using the equation d(m) = 1.5 + 0.035 + randn for n = 1 to 100. 


9.4 Orthogonality principle 


To obtain the orthogonality principle for the least-squared problem we follow 
the procedure developed for the Wiener filters. Therefore, using the 
unweighted sum of the squares of the error, we obtain 


(WW, Wy °° Dy.) _ etm 
oct a (m)e(m) |= 2 LUU p= 
dw, HS | Sein) 


m=l1 m=1 


(9.4.1) 
But (9.2.6) is equal to (w has M coefficients) 


e(m) = d(m)- X ux, (m) (9.4.2) 
k=1 
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and, therefore, taking the derivative of e(m) with respect to w, and introduc- 
ing the results in (9.4.1), we obtain 


Jw 
ae 2D e (9.4.3) 


We note that when w = w (the optimum value) we have the relation- 
ship A =0 for k = 1, 2, ..., M, and hence, (9.4.3) becomes 


Sands (om =é@x, k=1,2,--,M (9.4.4) 

where l 
e=[e(1) (2) ê(3)-- &ANŅ'=d-9 (9.4.5) 
x=[x (1) x,(2) x (NJ k=1,2,-,M (9.4.6) 


the estimated error e(m) is optimum in the least-squares sense. The above 
result is known as the principle of orthogonality. 


Corollary 
Equations (9.2.6) may be written as the sum of the columns of X as follows 
2 
y= x (n)w, n=1,2, ---,N (9.4.7) 
k=l 


Multiplying (9.4.7) by e and taking into consideration, we obtain 
e'y=0 (9.4.8) 


The above corollary indicates that when the coefficients of the filter are 
optimum in the least-squares sense, then the output of the filter and the error 
are orthogonal. 


Example 9.4.1: Using the results of Example 9.2.1, we find 


1.05953819523825 
. _ | 0.91864528560697 | p i 
e=d-y= ; ê"x 2 505 X10"; ex =1.457 x10 
0.48159639439564 


1.21506254286554 
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9.5 Projection operator 


Projection operator gives another from of interpretation to the solution of the 
least-squares problem. Let us, for clarity, assume that we have 2 (N vectors 
in the N* dimensional case) vectors x, that form two-dimensional subspace 
(see Figure 9.5.1). The vectors x, and x, constitute the column space of the 
data matrix X. 

We note the following: 


1. The vector d is obtained as a linear combination of the data column 
space X,,X,, u Xy of X that constitutes the subspace of M. 

2. From all the vectors in the subspace spanned by o the 
vector d has the minimum Euclidian distance from d. 

3. The difference ê= d- d is a vector that is orthogonal to the subspace. 


We also note that y satisfies the above three properties. From (9.4.7) we 
observe that y is a linear combination of the data column space, which spans 
the subspace. Next, minimizing é'é, where ê=d-d, is equivalent to mini- 
mizing the Euclidian distance between d and y. The third property is satis- 
fied by (9.4.8). Therefore, we can conclude that y is that the projection of d 
into the subspace spanned by the vectors x,,x,,---,X,,- 

Equation (9.4.7) may also be written in the matrix form 


y =Xw=X(X™X)"X"d (9.5.1) 





X1 


Figure 9.5.1 Vector space interpretation of the least-squares problems for N = 3 
(data space) and M = 2 (estimation subspace). 
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where we set w = Rp (see (9.2.21)), R = (X™X)"! (see (9.2.17)), and p = X'd 
(see (9.2.18)). Since the matrix 


P = X(X™X) 1X" (9.5.2) 


projects the desired vector in the N-dimensional space to y in the M-dimen- 

sional subspace (N > M), it is known as the projection matrix or projection 

operator. The name is due to the fact that the matrix P projects the data vector 

d onto the column space of X to provide the least-squares estimate y of d. 
The least-squares error can be expressed as 


é=d-y=d-—Pd=(1-P)d (9.5.3) 


where I is an N x N identity matrix. The projection matrix is equal to its 
transpose (Hermitian for complex matrix) and independent, that is 


P=P'; p?=Pp'p=P (9.5.4) 


The matrix I-P is known as the orthogonal complement projection operator. 
The filter coefficients are given by 


w =R"p=(X'X)'X"d (9.5.5) 
where 


X* =(X'X)'Xx! (9.5.6) 


is an M x N matrix known as the pseudo-inverse or the Moore-Penrose gen- 
eralized inverse of matrix X (see Appendix A). 


Example 9.5.1: Using the data given in Example 9.2.1, we obtain 


0.7278 -0.2156 0.2434 0.3038 

-0.2156 0.7762 0.0013 0.3566 
P=X(X'X)'X' = 

0.2434 0.0013 0.0890 0.1477 


0.3038 0.3566 0.1477 0.4068 


y = Pd=[1.0595 0.9186 0.4815 1.2150] 
e=(I-P)d=[0.0595 0.0813 0.5184 —0.2150]' 
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9.6 Least-squares finite impulse response filter 
The error of the filter is given by 


M 
e(n)=d(n) -5 w x(n- k)=d(n)- w'x(n) (9.6.1) 
k=l 


where d(n) is the desired signal, 
x(n)=[x(n) x(n—-1) - x(n-M+1)]' (9.6.2) 
is the input data to the filter, and 
w=[w, w, = wI (9.6.3) 


is the filter coefficient vector. 

It turns out that the exact form of e, d, and X depends on the range 
N SASN of the data to be used. Therefore, the range of the square-error 
summation then becomes 


n=N , 


JÊE= > e? (n)=e"e (9.6.4) 


n=N. 
L 


The least-squares finite impulse response filter is found by solving the 
least-squares normal equations (see (9.2.20) and (9.2.30)) 


(X'X)w=X'd=p (or Rw=p) (9.6.5) 
with the minimum least-squares error 
A = sie 
J =E SE -pw (9.6.6) 


where E; = d'd is the energy of the desired signal. The elements of the 
time-averaged correlation matrix R are given by (the real averaged correla- 
tion coefficients must be divided by N,- Nj) 


N; 
r= xix, = $ x(nt1-ia(n+1-f) 1<i, {<M (9.6.7) 
n=N, 
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There are two important different ways to select the summation range 
N,<n<N,, which are exploited in Problem 9.6.1. These are the no-window 
case, where N; = M— 1 and Ne=N-1, and the full-window case, where the 
range of the summation is from N; = 0 to N= N + M - 2. The no-window 
method is also known as the autocorrelation method, and the full-window 
method is also known as the covariance method. 

The covariance method data matrix D is written as follows: 


x(M) x(M +1) --- x(N) 


x(M—-1) xM) =- x(N-1) 
D' =[x(M) x(M+1) --- x(M)]= . l 


x(1) x(2) +- {N-M+1) 
(9.6.8) 


Then the M x M time-averaged correlation matrix is given by 


N 
R= x(n)x'(n)=D'D (9.6.9) 


n 


Book MATLAB function for covariance data matrix 
function[dT]=aadatamatrixcovmeth(x,M) 
%function[dT]=aadatamatrixconvmeth (x,M) 
%M=number of filter coefficients;x=data vector; 
$dT=transposed data matrix; 
for m=1:M 

for n=1:length(x)-M+1 

dT (m, n) =x (M-m+n) ; 

end; 

end; 


Example 9.6.1: If the data vector is x = [0.7 1.4 0.4 1.3 0.1]! and the 
data filter has three coefficients, then 


0.4 1.3 0.1 
D'=|14 04 13 (D' is Toeplitz matrix) 
0.7 14 0.4 
1.8600 1.2100 2.1400 


R=D'D=!1.2100 3.8100 2.0600 
2.1400 2.0600 2.6100 





188 Adaptive filtering primer with MATLAB 


The data matrix in (9.6.8) has the following properties: 


Property 1: The correlation matrix R is equal to its transpose (for 
complex quantities, R is Hermitian R = R1). The proof is directly 
found from (9.6.9). 

Property 2: The correlation matrix is nonnegative definite, a'Ra>0 for 
any M x 1 vector a (see Problem 9.6.1). 

Property 3: The eigenvalues of the correlation matrix R are all real and 
nonnegative (see Section 5.1). 

Property 4: The correlation matrix is the product of two rectangular Toeplitz 
matrices that are the transpose of each other (see Example 9.6.1) 


The following book MATLAB function will produce the results for the 
no-window method FIR filter: 


Book MATLAB function no-window LS method 


function[R,w,Jmin]=aanowindowleastsqufir(x,M,d) 
%x=data of length N;M=number of filter coefficient; 
%d=desired signal={d(M) d(M+1) ee a(n} 
N=length (x) ; 
for i=1:M 

for j=1:N-M+1 

Dir JI =X Maa); 

end; 
end; 
Dt=D'; 
R=D3 Dt; 
p=D*d(1,1:N-M+1)'; 
w=inv (R) *p; 
Jmin=d'*d-p'*w; 


9.7 Introduction to RLS algorithm 


The least-squares solution (9.2.21) is not very practical in the actual imple- 
mentation of adaptive filters. This is true, because we must know all the past 
samples of the input signal, as well as the desired output must be available 
at every iteration. The RLS algorithm is based on the LS estimate of the filter 
coefficients w(n — 1) at iteration n — 1, by computing its estimate at iteration n 
using the newly arrived data. This type of algorithm is known as the recursive 
least-squares (RLS) algorithm. This algorithm may be viewed as a special case 
of the Kalman filter. 

To implement the recursive method of least squares, we start the com- 
putation with known initial conditions and then update the old estimate 
based on the information contained in the new data samples. Next, we 
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minimize the cost function J(7), where n is the variable length of the observed 
data. Hence, we write (9.2.3) in the form 


J(n)= 5 n (k)e?(k), n,(k) = weighting factor (9.7.1) 
where . 
e(k) = d(k) — y(k) = d(k) — w" (k)x(k) (9.7.2) 
x(k) =[x(k) x(k-1) = x(k-M+D)J (9.7.3) 
w(n)=[w,(n) w,(n) = wy (ml! (9.7.4) 


Note that the filter coefficient are fixed during the observation time 1<k <n 
during which the cost function J(n) is defined. 

In standart RLS algorithm, the weighting factor 7,(k) is chosen to have 
the exponential form 


n, (A) = ae k = L 2; An (9.7.5) 


where the value of À is less than one and, hence, n,(k) is confined in the range 
O0< 17 (k) <1 for k = 1, 2,..., n. The weighting factor A is also known as the 
forgetting factor, since it weights (emphasizes) the recent data and tends to 
forget the past. This property helps in producing an adaptive algorithm with 
some tracking capabilities. Therefore, we must minimize the cost function 


Jn) = ` Ate? (k) (9.7.6) 
k=1 


The minimum value of J(n) is attained (see Section 9.2) when the normal 
equations (see (9.2.20) 


R,(nw=p,(n) (=R p(n) (9.7.7) 


are satisfied and where the M x M correlation matrix R,(n) is defined by (see 
Problem 9.7.1) 


R (n)= Y Axx") = X™AX (9.7.8) 


k=1 
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and 


p,(n) = X AxA) =X"Ad; A=diag[A™! A"? ..-1] (9.7.9) 


k=1 


Note that R,() differs from R in the following two respects: 1) the common 
matrix x(k)x'(k) is weighted by the exponential factor A”*, 2) the use of 
prewindowing is assumed, according to which the input data prior to time 
k = 1 are zero and, thus, k = 1 becomes the lower limit of the summation; the 
same is true for p(n). 

The minimum total square error is (see Problem 9.3.2) 


Tin =d (1) Ad(n)— w"(n)p (1) = Y adh —w'(n)p,(n) (9.7.10) 


k=1 


Next, we wait for a time such that n > M, where in practice R, is nonsingular, 
and then compute R, and p,(n). Next we solve the normal Equations (9.7.7) to 
obtain the filter coefficients w(n). This is repeated with the arrival of the 
new pairs {x(n), d(n)}, that is, at times n+ 1, n + 2,.... 


If we isolate the term at k =n, we can write (9.7.7) in the form 


n—l 


R (n)=A Š Am URS (kx? a) +x(n)x"(n) (9.7.11) 
k=l 


By definition the expression in the bracket is R,(” — 1), and thus, (9.7.11) 
becomes 


R (n)=AR,(n-1)+x(n)x" (n) (9.7.12) 


The above equation shows that the “new” correlation matrix R,(1) is updated 
by weighting the “old” correlation matrix R,(n — 1) with the factor and adding 
the correlation term x(n)x'(7). 

Similarly, using (9.7.9) we obtain 


p, (n)=Ap,(n—1)+x(n)d(n) (9.7.13) 


which gives an update of the cross-correlation vector. 
Next, we try to find w by iteration and thus avoid solving the normal 
Equations (9.7.7). 
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The matrix inversion lemma 


There are several relations, which are known as the inversion lemma. Let A 
be an M x M invertible matrix and x and y be two M x 1 vectors such that 
(A + xy’) is invertible. Then we have (see Problem 9.7.3) 


Ax TA 
A+txy y =A 7-2 2 9.7.14 
(A+xy ) l+y'A x ( ) 
Next, let A and B be positive-definite M x M matrices related by 
A=B'+cCD'c' (9.7.15) 


where D is a positive-definite matrix of N x M and C is another M x N 
matrix. The inversion lemma tell us that (see Problem 9.7.4) 


A?=B-BC(D+C'BC)'C'B (9.7.16) 


Furthermore, (9.7.14) can also be in the form 


E 4 aAŤxx" AŤ 
(A +axx") 1 = A es pay A Ty (9.7.17) 
; -1 4 -1 Tr3-14Ą-1 
(AA+xx!y! =j 7A AA )xx CTA’) (9.7.18) 


1+A”*xTA lx 


The RLS algorithm 


To evaluate the inverse of R,(1) we set A = AR,(n — 1) and comparing (9.7.12) 
and (9.7.18) we find 


APR (n-1)x(n)x" (nR (n-1) 
1+ A“x" (n)RA(n—Dx(n) 





R '(n) =A R (n-1)- (9.7.19) 


The same relation is found if we set A = R,(n), B = AR, (n — 1), C = x(n), and 
D = 1 in (9.7.16). Next we define the column vector g(n) as follows: 


1+47x (MR (n-1x(n) A+x" (nR (n-1)x(n) 





g(n) = 


This vector is known as the gain vector. 
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